[ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759557#action_12759557 ]
Uwe Schindler edited comment on LUCENE-1926 at 9/25/09 8:06 AM: ---------------------------------------------------------------- That's exactly the case. You should also capture the state in "case 1:". The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken (the same like the reusable TokenAPI is not forced to always use the same reusable token). If you do not reuse tokens, this is exactly the case (The Token instance in the wrapper is replaced), so the attribute contents gets lost (empty token instance). One could fix this ba an extra token cloning, but even with the old API (next(Token) it would never have been worked. Because of this, all Tokenizer *should* call clearAttributes() first to have a new start. I am not sure, if it worked correctly before LUCENE-1919. ADDENDUM: You should never rely on attributes preserved between calls. If you plug another TokenFilter on top of your filter, this filter could change the tokens. The Tokens are currently only preserved 100% if you only use incrementToken() and your filter/Tokenizer is the only one modifying the tokens. You can never guarantee that. This issue is won't fix, as exspected behaviour. Ok with that? was (Author: thetaphi): That's exactly the case. You should also capture the state in "case 1:". The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken (the same like the reusable TokenAPI is not forced to always use the same reusable token). If you do not reuse tokens, this is exactly the case (The Token instance in the wrapper is replaced), so the attribute contents gets lost (empty token instance). One could fix this ba an extra token cloning, but even with the old API (next(Token) it would never have been worked. Because of this, all Tokenizer *should* call clearAttributes() first. I am not sure, if it worked correctly before LUCENE-1919. > Back compat break with old next() consumer API > ---------------------------------------------- > > Key: LUCENE-1926 > URL: https://issues.apache.org/jira/browse/LUCENE-1926 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.9 > Reporter: Robert Muir > Attachments: CaptureStateTestcase.java > > > There is a bug that causes tokenstreams to return different results, > depending upon whether they are consumed with the incrementToken() api or the > next() api. > I found this because the Solr analysis tool in the admin page uses the next() > api, and i was seeing strange results. > I've created a test case to show the problem. when calling captureState(), > the current state is erased, but only when consuming with the next() api. > If I consume with incrementToken(), things work. > {code} > State tempState = captureState(); // after we capture state here, things get > strange. > String right = termAtt.term(); // when using old consumer API, this value is > wrong!!!! > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org