Only analyzers which have non-final tokenStream/reusableTokenStream methods have this. As soon as Analyzer itself or both methods are final, this code block is not needed.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de <http://www.thetaphi.de/> eMail: [email protected] From: DM Smith [mailto:[email protected]] Sent: Wednesday, December 01, 2010 6:46 AM To: [email protected] Subject: Thai Analyzer in 3.0.2 I'm curious about somethings in the ThaiAnalyzer It has: @Override public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException { if (overridesTokenStreamMethod) { // LUCENE-1678: force fallback to tokenStream() if we // have been subclassed and that subclass overrides // tokenStream but not reusableTokenStream return tokenStream(fieldName, reader); } SavedStreams streams = (SavedStreams) getPreviousTokenStream(); if (streams == null) { streams = new SavedStreams(); streams.source = new StandardTokenizer(matchVersion, reader); streams.result = new StandardFilter(streams.source); streams.result = new ThaiWordFilter(streams.result); streams.result = new StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion ), streams.result, StopAnalyzer.ENGLISH_STOP_WORDS_SET); setPreviousTokenStream(streams); } else { streams.source.reset(reader); streams.result.reset(); // reset the ThaiWordFilter's state } return streams.result; } I'm really curious why reusableTokenStream has the block: if (overridesTokenStreamMethod) { // LUCENE-1678: force fallback to tokenStream() if we // have been subclassed and that subclass overrides // tokenStream but not reusableTokenStream return tokenStream(fieldName, reader); } but nearly no other Analyzer in contrib has it. (None that I have seen.) Shouldn't it be in all of them? And also about: streams.source.reset(reader); streams.result.reset(); // reset the ThaiWordFilter's state This calls reset on everything from the bottom to the top. Most of the implementations of the class just have streams.source.reset(reader); It seems to me that calling streams.source.reset(reader) presumes that the chain only needs to be reset at the tokenizer. The documentation for reset() does not indicate that it should always call super.reset() or input.reset(), which is necessary for chaining back up to the tokenizer. If we go to a declarative model for an analyzer, I would think that one would always want to do both. -- DM
