[
https://issues.apache.org/jira/browse/JCR-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
fabrizio giustina updated JCR-2622:
-----------------------------------
Summary: Index analizers that extends StandardAnalyzer need to implement
reusableTokenStream() since jackrabbit 2.1 (was: Configured index analizers
not working in jackrabbit 2.1 and 2.2)
Looks like I spoke too soon, after a deeper analysis I found out the problem
can be fixed in the analyzer class and doesn't require a fix in jackrabbit
itself.
The change in JCR-2505 actually broke index analyzers that don't implement the
reusableTokenStream() method properly:
any analyzer that extends org.apache.lucene.analysis.standard.StandardAnalyzer
was working properly in jackrabbit 2.0 which was using the tokenStream() method
only. But since jackrabbit 2.1 such analizers cannot rely on the superclass
implementation of reusableTokenStream() and they have to implement such method
properly.
The correct solution is probably not to extends StandardAnalyzer anymore (the
reusableTokenStream method is not ovveraidable due to the usage private fields)
but to extend a plain org.apache.lucene.analysis.Analyzer and reimplement the
tokenStream method from scratch.
So the problem looks like a but in all the analyzers I was using, but in a part
that has never been used by jackrabbit before the change in version 2.1... the
issue can be closed
> Index analizers that extends StandardAnalyzer need to implement
> reusableTokenStream() since jackrabbit 2.1
> ----------------------------------------------------------------------------------------------------------
>
> Key: JCR-2622
> URL: https://issues.apache.org/jira/browse/JCR-2622
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0
> Reporter: fabrizio giustina
> Priority: Critical
> Attachments: JCR-2622-tests_and_patch.diff
>
>
> I just tried migrating an existing project which was using jackrabbit 2.0.0
> to 2.1.0.
> We have an index analyzer configured which filters accented chars:
> {code}
> public class ItalianSnowballAnalyzer extends StandardAnalyzer
> {
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader)
> {
> return new ISOLatin1AccentFilter(new
> LowerCaseFilter((super.tokenStream(fieldName, reader))));
> }
> }
> {code}
> The project has a good number of unit tests, an xml is loaded in a
> memory-only jackrabbit repository and several queries are checked against
> expected results.
> After migrating to 2.1.0 none of the tests that relied on the Index analizer
> work anymore, for example searching for "test" doesn't find anymore nodes
> containing "tèst".
> Upgrading to jackrabbit 2.1.0 is the only change done (no changes in the
> configuration/code or other libraries at all). Rolling back to the 2.0.0
> dependency is enough to make all the tests working again.
> I've checked the changes in 2.1 but I couldn't find any apparently related
> change. Also note that I was already using the patch in JCR-2504 also before
> (configuration loading works fine in the unpatched 2.1). Another point is
> that the configured IndexAnalyzer still gets actually called during our tests
> (checked in debug mode).
> Any idea?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.