[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619961#action_12619961 ]
DM Smith commented on LUCENE-1350: ---------------------------------- The non-reuse interface is deprecated. LUCENE-1333 deals with cleaning that up and applying reuse in all of Lucene. To date, it was partially applied to core. This results in sub-optimal performance with Filter chains that use both reuse and non-reuse inputs and filters. So LUCENE-1333 updates SnowballFilter to use next(Token). The documentation in TokenStream documents that only producers invoke clear(). To me, it is not clearcut what a producer or a consumer actually is. Obviously, input streams are producers. Some filters, generate multiple tokens as a replacement for the current one (e.g. NGram, stemming,...). To me, these are producers. If the rule of thumb is that Filters are consumers, merely changing their token's term, then there are lot's of places that need to be changed. I noticed that SnowballFilter's methodology was fairly common: Token token = input.next(); ... String newTerm = ....; ... return new Token(newTerm, token.startOffset(), token.endOffset(), token.type()); In migrating this to the reuse pattern, I saw new Token(...) as a producer pattern and to maintain the equivalent behavior clear() needed to be called: public Token next(Token token) { token = input.next(token); ... String newTerm = ....; ... token.clear(); // do most of the initialization that new Token does token.setTermBuffer(newTerm); // new method introduced in LUCENE-1333 return token; } I don't know why the following pattern was not originally used (some filters do this) or why you didn't migrate to this: Token token = input.next(); ... String newTerm = ....; ... token.setTermText(newTerm); return token; This would be faster than cloning and would preserve all fields. > SnowballFilter resets the payload > --------------------------------- > > Key: LUCENE-1350 > URL: https://issues.apache.org/jira/browse/LUCENE-1350 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/* > Reporter: Doron Cohen > Assignee: Doron Cohen > Attachments: LUCENE-1350.patch > > > Passing tokens with payloads through SnowballFilter results in tokens with no > payloads. > A workaround for this is to apply stemming first and only then run whatever > logic creates the payload, but this is not always convenient. > Patch to follow that preserves the payload. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]