I have seen ur mail, but this bug should not be related to the new Token API, it should occur with old API, too.
I did not look very close into the implementations, I only checked who changes what in which way. And I see that there is only one Token instance with a termBuffer that is changed. No problem at all for the new API. It would even work with forcefully cloning Tokens inside CachingTokenFilter. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Thursday, August 06, 2009 5:22 PM > To: java-dev@lucene.apache.org > Subject: Re: Issue with Solr TokenFilter and the new TokenStream API > > uwe look at the patch i pasted in haste (i have a delivery guy here, > sorry). > > the filter had a bug all along (it was using termBuffer.length for > some length calculations). > > On Thu, Aug 6, 2009 at 11:17 AM, Uwe Schindler<u...@thetaphi.de> wrote: > > I looked into the code of this Filter. It is very simple and should work > out > > of the box. There is no cloning done. When the indexer calls > incrementToken, > > the delegation to next(Token) does not clone at all. It just uses the > > encapsulated Token instance (inside the AttributeImpl TokenWrapper) as > > reusableToken and calls next(reusable) and then replaces the > encapsulated > > instance by the return value of next() -- so no cloning. As you do not > > change the token instance at all and return the reusable token it is all > > done on one Token/Attribute instance. > > > > In my opinion, this is the simpliest TokenFilter that could occur, it > just > > changes the contents of the buffer. By the way, this one could be easily > > rewritten to use incrementToken() without cloning, just use > > termAtt.setTermBuffer() and so on. > > > > Where do you see a problem, does it simply not work or do you think > there > > could be an issue? > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > >> -----Original Message----- > >> From: Mark Miller [mailto:markrmil...@gmail.com] > >> Sent: Thursday, August 06, 2009 4:14 PM > >> To: java-dev@lucene.apache.org > >> Subject: Issue with Solr TokenFilter and the new TokenStream API > >> > >> I think there is an issue here, but I didn't follow the TokenStream > >> improvements very closely. > >> > >> In Solr, CapitalizationFilterFactory has a CharArray set that it loads > >> up with keep words - it then checks (with the old TokenStream API) each > >> token (char array) to see if it should keep it. I think because of the > >> cloning going on in next, this breaks and you can't match anything in > >> the keep set. Does that make sense? > >> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org