[ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800464#action_12800464
 ] 

Robert Muir commented on LUCENE-2211:
-------------------------------------

before committing any fix i want to review / add tests for any tokenstreams 
that do not yet use this BaseTokenStreamTestCase, just to be sure there are no 
others with this problem.

it may seem trivial but if this clearing does not take place properly, then 
things like position increment with stopfilter can grow to very large values, 
overflow, and cause IndexWriter to throw an exception: 
http://www.lucidimagination.com/search/document/f649a19901d33c75/illegalargumentexception_when_indexwriter_adddocument



> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2211
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2211
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/analyzers
>    Affects Versions: 2.9, 2.9.1, 3.0
>            Reporter: Uwe Schindler
>             Fix For: 2.9.2, 3.0.1, 3.1
>
>         Attachments: LUCENE-2211.patch, LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to