Uwe, I will throw another twist at it: I do not like the name of Tokenizer.reset(Reader) method. I wish it was called .setReader() or something else, I think it is confusing for Tokenizer to have .reset() and .reset(Reader), when the latter should hardly ever be overridden.
Great example of how insane this is right now, is StandardTokenizer: @Override public void reset(Reader reader) throws IOException { super.reset(reader); reset(); } LOL! On Sun, Nov 15, 2009 at 12:45 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Yes, but on the other hand it does not hurt to automaticall reset in the > analyzer.... **krr** I do not know how to proceed. I think we should keep > it as it was since the beginning of Lucene (call to reset inside analyzer, > QP) and document it correctly. > > > > You are right, at the beginning, BaseTokenStreamTestCase and many other > hardcoded tests did not call reset. Now, the test also calls end() and > close(). > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Robert Muir [mailto:rcm...@gmail.com] > *Sent:* Sunday, November 15, 2009 6:40 PM > > *To:* java-dev@lucene.apache.org > *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? > > > > ok, at one point i do not think BaseTokenStreamTestCase did. > > if this is the case, then its the consumer's responsibility to call reset, > and we should remove extra resets() inside reusableTokenStream() from > analyzers that have it... and probably improve the docs of this contract. > > On Sun, Nov 15, 2009 at 12:17 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > Even QueryParser calls reset() as first call. Also BaseTokenStreamTestCase > does it. > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Robert Muir [mailto:rcm...@gmail.com] > *Sent:* Sunday, November 15, 2009 6:14 PM > > > *To:* java-dev@lucene.apache.org > *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? > > > > Uwe, not so sure it doesn't need to be there, what about other consumers > such as QueryParser? > > On Sun, Nov 15, 2009 at 12:02 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > I checked again, reset() on the top filter does not need to be there, as > the indexer calls it automatically as the first call after > reusableTokenStream. For reusing only reset(Reader) must be called. It’s a > little bit strange that both methods have the same name, the reset(Reader) > one has a completely different meaning. > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Uwe Schindler [mailto:u...@thetaphi.de] > *Sent:* Sunday, November 15, 2009 5:56 PM > > > *To:* java-dev@lucene.apache.org > > *Subject:* RE: Bug in StandardAnalyzer + StopAnalyzer? > > > > It should be there... But ist unimplemented in the TokenFilters used by > Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk > with Robert Muir about it. > > > > Uwe > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Eran Sevi [mailto:erans...@gmail.com] > *Sent:* Sunday, November 15, 2009 5:51 PM > *To:* java-dev@lucene.apache.org > *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? > > > > Good point. I missed that part :) since only the tokenizer uses the reader, > we must call it directly. > > So the reset() on the filteredTokenStream was omitted on purpose because > there's not underlying implementation? or is it really missing? > > On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > It must call both reset on the top-level TokenStream and reset(Reader) on > the Tokenizer-. If the latter is not done, how should the TokenStream get > his new Reader? > > > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > ------------------------------ > > *From:* Eran Sevi [mailto:erans...@gmail.com] > *Sent:* Sunday, November 15, 2009 5:19 PM > *To:* java-dev@lucene.apache.org > *Subject:* Bug in StandardAnalyzer + StopAnalyzer? > > > > Hi, > when changing my code to support the not-so-new reusableTokenStream I > noticed that in the cases when a SavedStream class was used in an analyzer > (Standard,Stop and maybe others as well) the reset() method is called on the > tokenizer instead of on the filter. > > The filter implementation of reset() calls the inner filters+input reset() > methods, but the tokenizer reset() method can't do that. > I think this bug hasn't caused any errors so far since none of the filters > used in the analyzers overrides the reset() method, but it might cause > problems if the implementation changes in the future. > > the fix is very simple. for example (in StandardAnalyzer): > > if (streams == null) { > streams = new SavedStreams(); > setPreviousTokenStream(streams); > streams.tokenStream = new StandardTokenizer(matchVersion, reader); > streams.filteredTokenStream = new > StandardFilter(streams.tokenStream); > streams.filteredTokenStream = new > LowerCaseFilter(streams.filteredTokenStream); > streams.filteredTokenStream = new > StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), > > streams.filteredTokenStream, stopSet); > } else { > streams.tokenStream.reset(reader); > } > > should become: > > if (streams == null) { > streams = new SavedStreams(); > setPreviousTokenStream(streams); > streams.tokenStream = new StandardTokenizer(matchVersion, reader); > streams.filteredTokenStream = new > StandardFilter(streams.tokenStream); > streams.filteredTokenStream = new > LowerCaseFilter(streams.filteredTokenStream); > streams.filteredTokenStream = new > StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), > > streams.filteredTokenStream, stopSet); > } else { > streams.filteredTokenStream.reset(); // changed line. > } > > > What do you think? > > Eran. > > > > > > > -- > Robert Muir > rcm...@gmail.com > > > > > -- > Robert Muir > rcm...@gmail.com > -- Robert Muir rcm...@gmail.com