Re: Issue with Solr TokenFilter and the new TokenStream API

Robert Muir Thu, 06 Aug 2009 08:32:40 -0700

the bug does occur with the old api (some of the evaluations have
incorrect length, but they are not keep words).


its just doesnt happen to make any tests fail (i guess
termBufferLength() happens to == termBuffer.length() for all the
tested keep words) with the old jar file...

On Thu, Aug 6, 2009 at 11:27 AM, Uwe Schindler<u...@thetaphi.de> wrote:
> I have seen ur mail, but this bug should not be related to the new Token
> API, it should occur with old API, too.
>
> I did not look very close into the implementations, I only checked who
> changes what in which way. And I see that there is only one Token instance
> with a termBuffer that is changed. No problem at all for the new API. It
> would even work with forcefully cloning Tokens inside CachingTokenFilter.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>> -----Original Message-----
>> From: Robert Muir [mailto:rcm...@gmail.com]
>> Sent: Thursday, August 06, 2009 5:22 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Issue with Solr TokenFilter and the new TokenStream API
>>
>> uwe look at the patch i pasted in haste (i have a delivery guy here,
>> sorry).
>>
>> the filter had a bug all along (it was using termBuffer.length for
>> some length calculations).
>>
>> On Thu, Aug 6, 2009 at 11:17 AM, Uwe Schindler<u...@thetaphi.de> wrote:
>> > I looked into the code of this Filter. It is very simple and should work
>> out
>> > of the box. There is no cloning done. When the indexer calls
>> incrementToken,
>> > the delegation to next(Token) does not clone at all. It just uses the
>> > encapsulated Token instance (inside the AttributeImpl TokenWrapper) as
>> > reusableToken and calls next(reusable) and then replaces the
>> encapsulated
>> > instance by the return value of next() -- so no cloning. As you do not
>> > change the token instance at all and return the reusable token it is all
>> > done on one Token/Attribute instance.
>> >
>> > In my opinion, this is the simpliest TokenFilter that could occur, it
>> just
>> > changes the contents of the buffer. By the way, this one could be easily
>> > rewritten to use incrementToken() without cloning, just use
>> > termAtt.setTermBuffer() and so on.
>> >
>> > Where do you see a problem, does it simply not work or do you think
>> there
>> > could be an issue?
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: u...@thetaphi.de
>> >
>> >> -----Original Message-----
>> >> From: Mark Miller [mailto:markrmil...@gmail.com]
>> >> Sent: Thursday, August 06, 2009 4:14 PM
>> >> To: java-dev@lucene.apache.org
>> >> Subject: Issue with Solr TokenFilter and the new TokenStream API
>> >>
>> >> I think there is an issue here, but I didn't follow the TokenStream
>> >> improvements very closely.
>> >>
>> >> In Solr, CapitalizationFilterFactory has a CharArray set that it loads
>> >> up with keep words - it then checks (with the old TokenStream API) each
>> >> token (char array) to see if it should keep it. I think because of the
>> >> cloning going on in next, this breaks and you can't match anything in
>> >> the keep set. Does that make sense?
>> >>
>> >> --
>> >> - Mark
>> >>
>> >> http://www.lucidimagination.com
>> >>
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
>> >
>> >
>>
>>
>>
>> --
>> Robert Muir
>> rcm...@gmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>



-- 
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Issue with Solr TokenFilter and the new TokenStream API

Reply via email to