[jira] Updated: (JCR-2622) Index analizers that extends StandardAnalyzer need to implement reusableTokenStream() since jackrabbit 2.1

fabrizio giustina (JIRA) Sat, 01 Jan 2011 11:52:11 -0800

     [ 
https://issues.apache.org/jira/browse/JCR-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


fabrizio giustina updated JCR-2622:
-----------------------------------

    Summary: Index analizers that extends StandardAnalyzer need to implement 
reusableTokenStream() since jackrabbit 2.1  (was: Configured index analizers 
not working in jackrabbit 2.1 and 2.2)

Looks like I spoke too soon, after a deeper analysis I found out the problem 
can be fixed in the analyzer class and doesn't require a fix in jackrabbit 
itself.

The change in JCR-2505 actually broke index analyzers that don't implement the 
reusableTokenStream() method properly: 
any analyzer that extends org.apache.lucene.analysis.standard.StandardAnalyzer 
was working properly in jackrabbit 2.0 which was using the tokenStream() method 
only. But since jackrabbit 2.1 such analizers cannot rely on the superclass 
implementation of reusableTokenStream() and they have to implement such method 
properly.

The correct solution is probably not to extends StandardAnalyzer anymore (the 
reusableTokenStream method is not ovveraidable due to the usage private fields) 
but to extend a plain org.apache.lucene.analysis.Analyzer and reimplement the 
tokenStream method from scratch.

So the problem looks like a but in all the analyzers I was using, but in a part 
that has never been used by jackrabbit before the change in version 2.1... the 
issue can be closed


> Index analizers that extends StandardAnalyzer need to implement 
> reusableTokenStream() since jackrabbit 2.1
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: JCR-2622
>                 URL: https://issues.apache.org/jira/browse/JCR-2622
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>    Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0
>            Reporter: fabrizio giustina
>            Priority: Critical
>         Attachments: JCR-2622-tests_and_patch.diff
>
>
> I just tried migrating an existing project which was using jackrabbit 2.0.0 
> to 2.1.0.
> We have an index analyzer configured which filters accented chars: 
> {code}
> public class ItalianSnowballAnalyzer extends StandardAnalyzer
> {
>     @Override
>     public TokenStream tokenStream(String fieldName, Reader reader)
>     {
>         return new ISOLatin1AccentFilter(new 
> LowerCaseFilter((super.tokenStream(fieldName, reader))));
>     }
> }
> {code}
> The project has a good number of unit tests, an xml is loaded in a 
> memory-only jackrabbit repository and several queries are checked against 
> expected results.
> After migrating to 2.1.0 none of the tests that relied on the Index analizer 
> work anymore, for example searching for "test" doesn't find anymore nodes 
> containing "tèst".
> Upgrading to jackrabbit 2.1.0 is the only change done (no changes in the 
> configuration/code or other libraries at all). Rolling back to the 2.0.0 
> dependency is enough to make all the tests working again.
> I've checked the changes in 2.1 but I couldn't find any apparently related 
> change. Also note that I was already using the patch in JCR-2504 also before 
> (configuration loading works fine in the unpatched 2.1). Another point is 
> that the configured IndexAnalyzer still gets actually called during our tests 
> (checked in debug mode).
> Any idea?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (JCR-2622) Index analizers that extends StandardAnalyzer need to implement reusableTokenStream() since jackrabbit 2.1

Reply via email to