[
https://issues.apache.org/jira/browse/LUCENE-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114300#comment-13114300
]
Uwe Schindler commented on LUCENE-2279:
---------------------------------------
You misunderstood the response: StopFilter indeed did not change. The change is
now that in Lucene 4.0 all Analyzers are required to reuse TokenStream
instances, so the StopFilter is only produced only once in your application
(when the Analyzer is created).
> eliminate pathological performance on StopFilter when using a Set<String>
> instead of CharArraySet
> -------------------------------------------------------------------------------------------------
>
> Key: LUCENE-2279
> URL: https://issues.apache.org/jira/browse/LUCENE-2279
> Project: Lucene - Java
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: thushara wijeratna
> Priority: Minor
>
> passing a Set<Srtring> to a StopFilter instead of a CharArraySet results in a
> very slow filter.
> this is because for each document, Analyzer.tokenStream() is called, which
> ends up calling the StopFilter (if used). And if a regular Set<String> is
> used in the StopFilter all the elements of the set are copied to a
> CharArraySet, as we can see in it's ctor:
> public StopFilter(boolean enablePositionIncrements, TokenStream input, Set
> stopWords, boolean ignoreCase)
> {
> super(input);
> if (stopWords instanceof CharArraySet) {
> this.stopWords = (CharArraySet)stopWords;
> } else {
> this.stopWords = new CharArraySet(stopWords.size(), ignoreCase);
> this.stopWords.addAll(stopWords);
> }
> this.enablePositionIncrements = enablePositionIncrements;
> init();
> }
> i feel we should make the StopFilter signature specific, as in specifying
> CharArraySet vs Set, and there should be a JavaDoc warning on using the other
> variants of the StopFilter as they all result in a copy for each invocation
> of Analyzer.tokenStream().
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]