[ 
https://issues.apache.org/jira/browse/LUCENE-5803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052844#comment-14052844
 ] 

ASF subversion and git services commented on LUCENE-5803:
---------------------------------------------------------

Commit 1608003 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1608003 ]

Merged revision(s) 1607998 from lucene/dev/trunk:
LUCENE-5803: Add DelegatingAnalyzerWrapper, an optimized variant of 
AnalyzerWrapper that doesn't allow to wrap components or readers

> Add another AnalyzerWrapper class that does not have its own cache, so 
> delegate-only wrappers don't create thread local resources several times
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5803
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5803
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>    Affects Versions: 4.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 5.0, 4.10
>
>         Attachments: LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch, 
> LUCENE-5803.patch, LUCENE-5803.patch, LUCENE-5803.patch
>
>
> This is a followup issue for the following Elasticsearch issue: 
> https://github.com/elasticsearch/elasticsearch/pull/6714
> Basically the problem is the following:
> - Elasticsearch has a pool of Analyzers that are used for analysis in several 
> indexes
> - Each index uses a different PerFieldAnalyzerWrapper
> PerFieldAnalyzerWrapper uses PER_FIELD_REUSE_STRATEGY. Because of this it 
> caches the tokenstreams for every field. If there are many fields, this are a 
> lot. In addition, the underlying analyzers may also cache tokenstreams and 
> other PerFieldAnalyzerWrappers do the same, although the delegate Analyzer 
> can always return the same components.
> We should add similar code to Elasticsearch's directly to Lucene: If the 
> delegating Analyzer just delegates per Field or just wraps CharFilters around 
> the Reader, there is no need to cache the TokenStreamComponents a second time 
> in the delegating Analyzers. This is only needed, if the delegating Analyzers 
> adds additional TokenFilters (like ShingleAnalyzerWrapper).
> We should name this new class DelegatingAnalyzerWrapper extends 
> AnalyzerWrapper. The wrapComponents method must be final, because we are not 
> allowed to add additional TokenFilters, but unlike ES, we don't need to 
> disallow wrapping with CharFilters.
> Internally this class uses a private ReuseStrategy that just delegates to the 
> underlying analyzer. It does not matter here if the strategy of the delegate 
> is global or per field, this is private to the delegate.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to