I think it's always better to do what you can inside the code and don't leave it for the clients that calls the method (just code duplication and a potential place for error if it's forgotten). So if it's not complicated I think the call to reset() on the chain of filters should be the responsibility of each analyzer and not the indexer/QP and others.
Changing the method name is also a good idea in my opinion. On Sun, Nov 15, 2009 at 8:33 PM, Robert Muir <rcm...@gmail.com> wrote: > Uwe, I will throw another twist at it: I do not like the name of > Tokenizer.reset(Reader) method. > I wish it was called .setReader() or something else, I think it is > confusing for Tokenizer to have .reset() and .reset(Reader), when the latter > should hardly ever be overridden. > > Great example of how insane this is right now, is StandardTokenizer: > > @Override > public void reset(Reader reader) throws IOException { > super.reset(reader); > reset(); > } > > LOL! > > > On Sun, Nov 15, 2009 at 12:45 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> Yes, but on the other hand it does not hurt to automaticall reset in the >> analyzer.... **krr** I do not know how to proceed. I think we should keep >> it as it was since the beginning of Lucene (call to reset inside analyzer, >> QP) and document it correctly. >> >> >> >> You are right, at the beginning, BaseTokenStreamTestCase and many other >> hardcoded tests did not call reset. Now, the test also calls end() and >> close(). >> >> >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> ------------------------------ >> >> *From:* Robert Muir [mailto:rcm...@gmail.com] >> *Sent:* Sunday, November 15, 2009 6:40 PM >> >> *To:* java-dev@lucene.apache.org >> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? >> >> >> >> ok, at one point i do not think BaseTokenStreamTestCase did. >> >> if this is the case, then its the consumer's responsibility to call reset, >> and we should remove extra resets() inside reusableTokenStream() from >> analyzers that have it... and probably improve the docs of this contract. >> >> On Sun, Nov 15, 2009 at 12:17 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> >> Even QueryParser calls reset() as first call. Also BaseTokenStreamTestCase >> does it. >> >> >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> ------------------------------ >> >> *From:* Robert Muir [mailto:rcm...@gmail.com] >> *Sent:* Sunday, November 15, 2009 6:14 PM >> >> >> *To:* java-dev@lucene.apache.org >> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? >> >> >> >> Uwe, not so sure it doesn't need to be there, what about other consumers >> such as QueryParser? >> >> On Sun, Nov 15, 2009 at 12:02 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> >> I checked again, reset() on the top filter does not need to be there, as >> the indexer calls it automatically as the first call after >> reusableTokenStream. For reusing only reset(Reader) must be called. It’s a >> little bit strange that both methods have the same name, the reset(Reader) >> one has a completely different meaning. >> >> >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> ------------------------------ >> >> *From:* Uwe Schindler [mailto:u...@thetaphi.de] >> *Sent:* Sunday, November 15, 2009 5:56 PM >> >> >> *To:* java-dev@lucene.apache.org >> >> *Subject:* RE: Bug in StandardAnalyzer + StopAnalyzer? >> >> >> >> It should be there... But ist unimplemented in the TokenFilters used by >> Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk >> with Robert Muir about it. >> >> >> >> Uwe >> >> >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> ------------------------------ >> >> *From:* Eran Sevi [mailto:erans...@gmail.com] >> *Sent:* Sunday, November 15, 2009 5:51 PM >> *To:* java-dev@lucene.apache.org >> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer? >> >> >> >> Good point. I missed that part :) since only the tokenizer uses the >> reader, we must call it directly. >> >> So the reset() on the filteredTokenStream was omitted on purpose because >> there's not underlying implementation? or is it really missing? >> >> On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> >> It must call both reset on the top-level TokenStream and reset(Reader) on >> the Tokenizer-. If the latter is not done, how should the TokenStream get >> his new Reader? >> >> >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> ------------------------------ >> >> *From:* Eran Sevi [mailto:erans...@gmail.com] >> *Sent:* Sunday, November 15, 2009 5:19 PM >> *To:* java-dev@lucene.apache.org >> *Subject:* Bug in StandardAnalyzer + StopAnalyzer? >> >> >> >> Hi, >> when changing my code to support the not-so-new reusableTokenStream I >> noticed that in the cases when a SavedStream class was used in an analyzer >> (Standard,Stop and maybe others as well) the reset() method is called on the >> tokenizer instead of on the filter. >> >> The filter implementation of reset() calls the inner filters+input reset() >> methods, but the tokenizer reset() method can't do that. >> I think this bug hasn't caused any errors so far since none of the filters >> used in the analyzers overrides the reset() method, but it might cause >> problems if the implementation changes in the future. >> >> the fix is very simple. for example (in StandardAnalyzer): >> >> if (streams == null) { >> streams = new SavedStreams(); >> setPreviousTokenStream(streams); >> streams.tokenStream = new StandardTokenizer(matchVersion, reader); >> streams.filteredTokenStream = new >> StandardFilter(streams.tokenStream); >> streams.filteredTokenStream = new >> LowerCaseFilter(streams.filteredTokenStream); >> streams.filteredTokenStream = new >> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), >> >> streams.filteredTokenStream, stopSet); >> } else { >> streams.tokenStream.reset(reader); >> } >> >> should become: >> >> if (streams == null) { >> streams = new SavedStreams(); >> setPreviousTokenStream(streams); >> streams.tokenStream = new StandardTokenizer(matchVersion, reader); >> streams.filteredTokenStream = new >> StandardFilter(streams.tokenStream); >> streams.filteredTokenStream = new >> LowerCaseFilter(streams.filteredTokenStream); >> streams.filteredTokenStream = new >> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion), >> >> streams.filteredTokenStream, stopSet); >> } else { >> streams.filteredTokenStream.reset(); // changed line. >> } >> >> >> What do you think? >> >> Eran. >> >> >> >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> >> >> >> >> -- >> Robert Muir >> rcm...@gmail.com >> > > > > -- > Robert Muir > rcm...@gmail.com >