Uwe Schindler created LUCENE-5803:
-------------------------------------

             Summary: Add another AnalyzerWrapper class that does not have its 
own cache, so delegate-only wrappers don't create thread local resources 
several times
                 Key: LUCENE-5803
                 URL: https://issues.apache.org/jira/browse/LUCENE-5803
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 4.9
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler
             Fix For: 5.0, 4.10


This is a followup issue for the following Elasticsearch issue: 
https://github.com/elasticsearch/elasticsearch/pull/6714

Basically the problem is the following:
- Elasticsearch has a pool of Analyzers that are used for analysis in several 
indexes
- Each index uses a different PerFieldAnalyzerWrapper

PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it 
caches the tokenstreams for every field. If there are many fields, this are a 
lot. In addition, the underlying analyzers may also cache tokenstreams and 
other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer can 
always return the same components.

We should add similar code to Elasticsearch's directly to Lucene: If the 
delegating Analyzer just delegates per Field or just wraps CharFilters around 
the Reader, there is no need to cache the TokenStreamComponents a second time 
in the delegating Analyzers. This is only needed, if the delegating Analyzers 
adds additional TokenFilters (like ShingleAnalyzerWrapper).

We should name this new class DelegatingAnalyzerWrapper extends 
AnalyzerWrapper. The wrapComponents method must be final, because we are not 
allowed to add additional TokenFilters, but unlike ES, we don't need to 
disallow wrapping with CharFilters.

Internally this class uses a private ReuseStrategy that just delegates to the 
underlying analyzer. It does not matter here if the strategy of the delegate is 
global or per field, this is private to the delegate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to