Uwe, I will throw another twist at it: I do not like the name of
Tokenizer.reset(Reader) method.
I wish it was called .setReader() or something else, I think it is confusing
for Tokenizer to have .reset() and .reset(Reader), when the latter should
hardly ever be overridden.
Great example of how insane this is right now, is StandardTokenizer:
@Override
public void reset(Reader reader) throws IOException {
super.reset(reader);
reset();
}
LOL!
On Sun, Nov 15, 2009 at 12:45 PM, Uwe Schindler <[email protected]> wrote:
> Yes, but on the other hand it does not hurt to automaticall reset in the
> analyzer.... **krr** I do not know how to proceed. I think we should keep
> it as it was since the beginning of Lucene (call to reset inside analyzer,
> QP) and document it correctly.
>
>
>
> You are right, at the beginning, BaseTokenStreamTestCase and many other
> hardcoded tests did not call reset. Now, the test also calls end() and
> close().
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> ------------------------------
>
> *From:* Robert Muir [mailto:[email protected]]
> *Sent:* Sunday, November 15, 2009 6:40 PM
>
> *To:* [email protected]
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> ok, at one point i do not think BaseTokenStreamTestCase did.
>
> if this is the case, then its the consumer's responsibility to call reset,
> and we should remove extra resets() inside reusableTokenStream() from
> analyzers that have it... and probably improve the docs of this contract.
>
> On Sun, Nov 15, 2009 at 12:17 PM, Uwe Schindler <[email protected]> wrote:
>
> Even QueryParser calls reset() as first call. Also BaseTokenStreamTestCase
> does it.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> ------------------------------
>
> *From:* Robert Muir [mailto:[email protected]]
> *Sent:* Sunday, November 15, 2009 6:14 PM
>
>
> *To:* [email protected]
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Uwe, not so sure it doesn't need to be there, what about other consumers
> such as QueryParser?
>
> On Sun, Nov 15, 2009 at 12:02 PM, Uwe Schindler <[email protected]> wrote:
>
> I checked again, reset() on the top filter does not need to be there, as
> the indexer calls it automatically as the first call after
> reusableTokenStream. For reusing only reset(Reader) must be called. It’s a
> little bit strange that both methods have the same name, the reset(Reader)
> one has a completely different meaning.
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> ------------------------------
>
> *From:* Uwe Schindler [mailto:[email protected]]
> *Sent:* Sunday, November 15, 2009 5:56 PM
>
>
> *To:* [email protected]
>
> *Subject:* RE: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> It should be there... But ist unimplemented in the TokenFilters used by
> Standard/Stop Analyzer. Buf for consistency it should be there. I’ll talk
> with Robert Muir about it.
>
>
>
> Uwe
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> ------------------------------
>
> *From:* Eran Sevi [mailto:[email protected]]
> *Sent:* Sunday, November 15, 2009 5:51 PM
> *To:* [email protected]
> *Subject:* Re: Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Good point. I missed that part :) since only the tokenizer uses the reader,
> we must call it directly.
>
> So the reset() on the filteredTokenStream was omitted on purpose because
> there's not underlying implementation? or is it really missing?
>
> On Sun, Nov 15, 2009 at 6:30 PM, Uwe Schindler <[email protected]> wrote:
>
> It must call both reset on the top-level TokenStream and reset(Reader) on
> the Tokenizer-. If the latter is not done, how should the TokenStream get
> his new Reader?
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
> ------------------------------
>
> *From:* Eran Sevi [mailto:[email protected]]
> *Sent:* Sunday, November 15, 2009 5:19 PM
> *To:* [email protected]
> *Subject:* Bug in StandardAnalyzer + StopAnalyzer?
>
>
>
> Hi,
> when changing my code to support the not-so-new reusableTokenStream I
> noticed that in the cases when a SavedStream class was used in an analyzer
> (Standard,Stop and maybe others as well) the reset() method is called on the
> tokenizer instead of on the filter.
>
> The filter implementation of reset() calls the inner filters+input reset()
> methods, but the tokenizer reset() method can't do that.
> I think this bug hasn't caused any errors so far since none of the filters
> used in the analyzers overrides the reset() method, but it might cause
> problems if the implementation changes in the future.
>
> the fix is very simple. for example (in StandardAnalyzer):
>
> if (streams == null) {
> streams = new SavedStreams();
> setPreviousTokenStream(streams);
> streams.tokenStream = new StandardTokenizer(matchVersion, reader);
> streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
> streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
> streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
> } else {
> streams.tokenStream.reset(reader);
> }
>
> should become:
>
> if (streams == null) {
> streams = new SavedStreams();
> setPreviousTokenStream(streams);
> streams.tokenStream = new StandardTokenizer(matchVersion, reader);
> streams.filteredTokenStream = new
> StandardFilter(streams.tokenStream);
> streams.filteredTokenStream = new
> LowerCaseFilter(streams.filteredTokenStream);
> streams.filteredTokenStream = new
> StopFilter(StopFilter.getEnablePositionIncrementsVersionDefault(matchVersion),
>
> streams.filteredTokenStream, stopSet);
> } else {
> streams.filteredTokenStream.reset(); // changed line.
> }
>
>
> What do you think?
>
> Eran.
>
>
>
>
>
>
> --
> Robert Muir
> [email protected]
>
>
>
>
> --
> Robert Muir
> [email protected]
>
--
Robert Muir
[email protected]