[ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620970#action_12620970 ]
Michael McCandless commented on LUCENE-1350: -------------------------------------------- It seems like there are three different things, here: # Many filters (eg SnowballFilter) incorrectly erase the Payload, token Type and token flags, because they are basically doing their own Token cloning. This is pre-existing (before re-use API was created). # Separately, these filters do not use the re-use API, which we are wanting to migrate to anyway. # Adding new "reuse" methods on Token which are like clear() except they also take args to replace the termBuffer, start/end offset, etc, and they do not clear the payload/flags to their defaults. Since in LUCENE-1333 we are aggressively moving all Lucene core & contrib TokenStream & TokenFilters to use the re-use API (formally deprecating the original non-reuse API), we may as well fix 1 & 2 at once. I think the reuse API proposal is reasonable: it mirrors the current constructors on Token. But, since we are migrating to reuse api, you need the analog (of all these constructors) without making a new Token. But maybe change the name from "reuse" to maybe "update", "set", "reset", "reinit", or "change"? But: I think this method should still reset payload, position incr, etc, to defaults? Ie calling this method should get you the same result as creating a new Token(...) passing in the termBuffer, start/end offset, etc, I think? Should we just absorb this issue into LUCENE-1333? DM, of your list above (of filters that lose payload), are there any that are not fixed in LUCENE-1333? I'm confused on the overlap and it's hard to work with all the patches. Actually if in LUCENE-1333 you could consolidate down to a single patch (big toplevel "svn diff"), that'd be great :) > Filters which are "consumers" should not reset the payload or flags and > should better reuse the token > ----------------------------------------------------------------------------------------------------- > > Key: LUCENE-1350 > URL: https://issues.apache.org/jira/browse/LUCENE-1350 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/* > Reporter: Doron Cohen > Assignee: Doron Cohen > Fix For: 2.3.3 > > Attachments: LUCENE-1350.patch > > > Passing tokens with payloads through SnowballFilter results in tokens with no > payloads. > A workaround for this is to apply stemming first and only then run whatever > logic creates the payload, but this is not always convenient. > Other "consumer" filters have similar problem. > These filters can - and should - reuse the token, by implementing > next(Token), effectively also fixing the unwanted resetting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]