On Wed, Aug 29, 2012 at 3:45 PM, Benson Margulies <ben...@basistech.com> wrote: > On Wed, Aug 29, 2012 at 3:37 PM, Robert Muir <rcm...@gmail.com> wrote: > >> ok, lets help improve it: I think these have likely always been confusing. >> >> before they were both reset: reset() and reset(Reader), even though >> they are unrelated. I thought the rename would help this :) >> >> Does the TokenStream workfloat here help? >> >> http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/analysis/TokenStream.html >> Basically reset() is a mandatory thing the consumer must call. it just >> means 'reset any mutable state so you can be reused for processing >> again'. >> > > I really did read this. setReader I get; I don't understand what reset > accomplishes. What does it mean to reuse one a TokenStream without calling > setReader to supply a new input?
TokenStream is more generic, it doesnt have to take Reader. It can take anything you want: e.g. a String or a byte array of your Word document or whatever. Tokenizer is a subclass that takes Reader. its the only thing that has setReader. reset() doesnt mean rewind. it just means clearing any accumulated internal state so its ready for processing again. so if i made a StringTokenizer class that extends Tokenizer, i would probably add setString(String s) to it so i could set new string objects on it, but consumers must always call reset() on the entire chain (the outer stopfilters, synonym filters, all this stuff that might be keeping state). this reset() call chains down all tokenstreams. -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org