right, this is a consistency issue. when reusing token streams, we should call reset(Reader) on the tokenizer, and also call reset() on the chain, and it should be passed down entire chain if all filters call super.reset()
the reason you don't see it happening in StandardAnalyzer, is because none of the filters keep any state that needs to be reset. on the other hand, look at ThaiAnalyzer in contrib, it has both resets because its ThaiWordFilter keeps state. maybe for consistency, it would be best to do both resets all the time, to set a good example. On Sun, Nov 15, 2009 at 11:55 AM, Uwe Schindler <u...@thetaphi.de> wrote: > It should be there... But ist unimplemented in the TokenFilters used by > Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk > with Robert Muir about it. > > > > Uwe > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Eran Sevi [mailto:erans...@gmail.com] > *Sent:* Sunday, November 15, 2009 5:51 PM > > *To:* java-dev@lucene.apache.org > *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? > > > > Good point. I missed that part :) since only the tokenizer uses the reader, > we must call it directly. > > So the reset() on the filteredTokenStream was omitted on purpose because > there's not underlying implementation? or is it really missing? > > On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > It must call both reset on the top-level TokenStream and reset(Reader) on > the Tokenizer-. If the latter is not done, how should the TokenStream get > his new Reader? > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Eran Sevi [mailto:erans...@gmail.com] > *Sent:* Sunday, November 15, 2009 5:19 PM > *To:* java-dev@lucene.apache.org > *Subject:* Bug in StandardAnalyzer + StopAnalyzer? > > > > Hi, > when changing my code to support the not-so-new reusableTokenStream I > noticed that in the cases when a SavedStream class was used in an analyzer > (Standard,Stop and maybe others as well) the reset() method is called on the > tokenizer instead of on the filter. > > The filter implementation of reset() calls the inner filters+input reset() > methods, but the tokenizer reset() method can't do that. > I think this bug hasn't caused any errors so far since none of the filters > used in the analyzers overrides the reset() method, but it might cause > problems if the implementation changes in the future. > > the fix is very simple. for example (in StandardAnalyzer): > > if (streams == null) { > streams = new SavedStreams(); > setPreviousTokenStream(streams); > streams.tokenStream = new StandardTokenizer(matchVersion, reader); > streams.filteredTokenStream = new > StandardFilter(streams.tokenStream); > streams.filteredTokenStream = new > LowerCaseFilter(streams.filteredTokenStream); > streams.filteredTokenStream = new > StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), > > streams.filteredTokenStream, stopSet); > } else { > streams.tokenStream.reset(reader); > } > > should become: > > if (streams == null) { > streams = new SavedStreams(); > setPreviousTokenStream(streams); > streams.tokenStream = new StandardTokenizer(matchVersion, reader); > streams.filteredTokenStream = new > StandardFilter(streams.tokenStream); > streams.filteredTokenStream = new > LowerCaseFilter(streams.filteredTokenStream); > streams.filteredTokenStream = new > StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), > > streams.filteredTokenStream, stopSet); > } else { > streams.filteredTokenStream.reset(); // changed line. > } > > > What do you think? > > Eran. > > > -- Robert Muir rcm...@gmail.com