[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

Shai Erera (JIRA) Fri, 14 Aug 2009 05:13:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743183#action_12743183
 ]


Shai Erera commented on LUCENE-1794:
------------------------------------

We only need getTokenizer because TokenStream.reset() does not accept a Reader. 
If we could introduce such method on TokenStream, we wouldn't need to refer to 
Tokenizer directly.

bq. do you have any ideas on the back compat issues?

Well it's a bit trickier ... today we call reusableTokenStream in our indexing 
code, and either get a new instance, or a reused instance. We cannot change 
Analyzer's default behavior, which returns a new instance (unless we're willing 
to break back-compat), because Analyzers that did not override 
reusableTokenStream, may break if we start reusing the instance by default (for 
example if they add two fields to a document w/ reusableTokenStream called 
twice).

Also, deprecate reusableTokenStream and define a new one (say 
reuseTokenStream), and move to use it is not good either, since we want its 
default impl to reuse the token stream, and impls that did not override it may 
break.

So how about if we create a new abstract ReusingAnalyzer which impls 
reusableTokenStream to always reuse it. And we add Streams to Analyzer as a 
protected static class. That way, Analyzers that don't care about reuse, can 
still extend Analyzer. Analyzers which care about reuse and are fine w/ 
ReusingAnalyzer's impl, can move to extend it. And Analyzers that care about 
reuse but want their reuse to be done differently can choose to extend 
ReusingAnalyzer, or Analyzer.

Back-compat wise, we're safe since:
# Existing Lucene Analyzers that reuse can be changed to extend ReusingAnalyzer.
# Existing Analyzers (outside Lucene code) either override or not 
reusableTokenStream, and therefore won't break.
# Our indexing code will still call reusableTokenStream, no change here.
# Any code out there which traverses an Analyzer by calling reusableTokenStream 
does not need to change anything.

I think that'd work?

> implement reusableTokenStream for all contrib analyzers
> -------------------------------------------------------
>
>                 Key: LUCENE-1794
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1794
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1794.patch, LUCENE-1794.patch, LUCENE-1794.patch, 
> LUCENE-1794.patch, LUCENE-1794.patch
>
>
> most contrib analyzers do not have an impl for reusableTokenStream
> regardless of how expensive the back compat reflection is for indexing speed, 
> I think we should do this to mitigate any performance costs. hey, overall it 
> might even be an improvement!
> the back compat code for non-final analyzers is already in place so this is 
> easy money in my opinion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1794) implement reusableTokenStream for all contrib analyzers

Reply via email to