[ 
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544175
 ] 

Doron Cohen commented on LUCENE-1063:
-------------------------------------

{quote}
So I don't think we need to change anything.
{quote}
(y) sounds good to me. 

(i) looking close at TokenStream it is interesting that next() and next(Token) 
as written will loop forever. So if a subclass just implements say next() by 
calling (super's) next(new Token()) it is an infinite loop. However anything 
like this would be buggy anyhow because no meaningful token is created this 
way. To summarize there's no action item here. (I thought about modifying the 
javadoc NOTE to: ??subclasses must +create the next Token+ by overriding at 
least one of next() or next(Token)??, but I am not convinced it is any clearer.)

> Token re-use API breaks back compatibility in certain TokenStream chains
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1063
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1063
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1063.patch
>
>
> In scrutinizing the new Token re-use API during this thread:
>   http://www.gossamer-threads.com/lists/lucene/java-dev/54708
> I realized we now have a non-back-compatibility when mixing re-use and
> non-re-use TokenStreams.
> The new "reuse" next(Token) API actually allows two different aspects
> of re-use:
>   1) "Backwards re-use": the subsequent call to next(Token) is allowed
>      to change all aspects of the provided Token, meaning the caller
>      must do all persisting of Token that it needs before calling
>      next(Token) again.
>   2) "Forwards re-use": the caller is allowed to modify the returned
>      Token however it wants.  Eg the LowerCaseFilter is allowed to
>      downcase the characters in-place in the char[] termBuffer.
> The forwards re-use case can break backwards compatibility now.  EG:
> if a TokenStream X providing only the "non-reuse" next() API is
> followed by a TokenFilter Y using the "reuse" next(Token) API to pull
> the tokens, then the default implementation in TokenStream.java for
> next(Token) will kick in.
> That default implementation just returns the provided "private copy"
> Token returned by next().  But, because of 2) above, this is not
> legal: if the TokenFilter Y modifies the char[] termBuffer (say), that
> is actually modifying the cached copy being potentially stored by X.
> I think the opposite case is handled correctly.
> A simple way to fix this is to make a full copy of the Token in the
> next(Token) call in TokenStream, just like we do in the next() method
> in TokenStream.  The downside is this is a small performance hit.  However
> that hit only happens at the boundary between a non-reuse and a re-use
> tokenizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to