Re: [jira] Commented: (LUCENE-1063) Token re-use API breaks back compatibility in certain TokenStream chains

Michael McCandless Tue, 20 Nov 2007 10:50:18 -0800

Will do ...

Mike


"Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:
> 
>     [
>     
> https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544005
>     ] 
> 
> Yonik Seeley commented on LUCENE-1063:
> --------------------------------------
> 
> Could we make this a little more concrete by creating a simple test case
> that fails?
> 
> 
> > Token re-use API breaks back compatibility in certain TokenStream chains
> > ------------------------------------------------------------------------
> >
> >                 Key: LUCENE-1063
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-1063
> >             Project: Lucene - Java
> >          Issue Type: Bug
> >          Components: Analysis
> >    Affects Versions: 2.3
> >            Reporter: Michael McCandless
> >            Assignee: Michael McCandless
> >             Fix For: 2.3
> >
> >
> > In scrutinizing the new Token re-use API during this thread:
> >   http://www.gossamer-threads.com/lists/lucene/java-dev/54708
> > I realized we now have a non-back-compatibility when mixing re-use and
> > non-re-use TokenStreams.
> > The new "reuse" next(Token) API actually allows two different aspects
> > of re-use:
> >   1) "Backwards re-use": the subsequent call to next(Token) is allowed
> >      to change all aspects of the provided Token, meaning the caller
> >      must do all persisting of Token that it needs before calling
> >      next(Token) again.
> >   2) "Forwards re-use": the caller is allowed to modify the returned
> >      Token however it wants.  Eg the LowerCaseFilter is allowed to
> >      downcase the characters in-place in the char[] termBuffer.
> > The forwards re-use case can break backwards compatibility now.  EG:
> > if a TokenStream X providing only the "non-reuse" next() API is
> > followed by a TokenFilter Y using the "reuse" next(Token) API to pull
> > the tokens, then the default implementation in TokenStream.java for
> > next(Token) will kick in.
> > That default implementation just returns the provided "private copy"
> > Token returned by next().  But, because of 2) above, this is not
> > legal: if the TokenFilter Y modifies the char[] termBuffer (say), that
> > is actually modifying the cached copy being potentially stored by X.
> > I think the opposite case is handled correctly.
> > A simple way to fix this is to make a full copy of the Token in the
> > next(Token) call in TokenStream, just like we do in the next() method
> > in TokenStream.  The downside is this is a small performance hit.  However
> > that hit only happens at the boundary between a non-reuse and a re-use
> > tokenizer.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-1063) Token re-use API breaks back compatibility in certain TokenStream chains

Reply via email to