[ 
https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619961#action_12619961
 ] 

DM Smith commented on LUCENE-1350:
----------------------------------

The non-reuse interface is deprecated. LUCENE-1333 deals with cleaning that up 
and applying reuse in all of Lucene. To date, it was partially applied to core. 
This results in sub-optimal performance with Filter chains that use both reuse 
and non-reuse inputs and filters.

So LUCENE-1333 updates SnowballFilter to use next(Token).

The documentation in TokenStream documents that only producers invoke clear().

To me, it is not clearcut what a producer or a consumer actually is. Obviously, 
input streams are producers. Some filters, generate multiple tokens as a 
replacement for the current one (e.g. NGram, stemming,...). To me, these are 
producers.

If the rule of thumb is that Filters are consumers, merely changing their 
token's term, then there are lot's of places that need to be changed. I noticed 
that SnowballFilter's methodology was fairly common:
Token token = input.next();
...
String newTerm = ....;
...
return new Token(newTerm, token.startOffset(), token.endOffset(), token.type());

In migrating this to the reuse pattern, I saw new Token(...) as a producer 
pattern and to maintain the equivalent behavior clear() needed to be called:
public Token next(Token token)
{
token = input.next(token);
...
String newTerm = ....;
...
token.clear(); // do most of the initialization that new Token does
token.setTermBuffer(newTerm); // new method introduced in LUCENE-1333
return token;
}

I don't know why the following pattern was not originally used (some filters do 
this) or why you didn't migrate to this:
Token token = input.next();
...
String newTerm = ....;
...
token.setTermText(newTerm);
return token;

This would be faster than cloning and would preserve all fields.



> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no 
> payloads.
> A workaround for this is to apply stemming first and only then run whatever 
> logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to