[jira] Commented: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

Uwe Schindler (JIRA) Sun, 16 May 2010 13:37:06 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868037#action_12868037
 ]


Uwe Schindler commented on LUCENE-2413:
---------------------------------------

Just thinking about MockFilter:
May this much faster than CharArraySet? If we build a DFA out of the stopwords, 
like done in the MockFilter, and also minimize it, will the checking for a hit 
not be much faster? e.g. if the first character of the termBuffer does not 
match the automaton it gets rejected. CAS always has to calculate the hashCode 
of the whole string first and then look it up.
I would like to see a comparison with a minimized Automaton vs. CAS for 
StopFilter. OK, LengthFilter is more performant by just comparing TermLength, 
but the StopFilter should be much faster.
I propose to pass a Set to the StopFilter and internally it converts it to a 
minimized Automaton similar to MockFilter.

> Consolidate all (Solr's & Lucene's) analyzers into modules/analysis
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2413
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2413
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Michael McCandless
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2413-charfilter.patch, LUCENE-2413-PFAW+LF.patch, 
> LUCENE-2413_commongrams.patch, LUCENE-2413_folding.patch, 
> LUCENE-2413_htmlstrip.patch, LUCENE-2413_keep_hyphen_trim.patch, 
> LUCENE-2413_mockfilter.patch, LUCENE-2413_mockfilter.patch, 
> LUCENE-2413_pattern.patch, LUCENE-2413_porter.patch, 
> LUCENE-2413_removeDups.patch, LUCENE-2413_synonym.patch, 
> LUCENE-2413_teesink.patch, LUCENE-2413_testanalyzer.patch, 
> LUCENE-2413_testanalyzer.patch, LUCENE-2413_tests2.patch, 
> LUCENE-2413_wdf.patch
>
>
> We've been wanting to do this for quite some time now...  I think, now that 
> Solr/Lucene are merged, and we're looking at opening an unstable line of 
> development for Solr/Lucene, now is the right time to do it.
> A standalone module for all analyzers also empowers apps to separately 
> version the analyzers from which version of Solr/Lucene they use, possibly 
> enabling us to remove Version entirely from the analyzers.
> We should also do LUCENE-2309 (decouple, as much as possible, indexer from 
> the analysis API), but I don't think that issue needs to block this 
> consolidation.
> Once we do this, there is one place where our users can find all the 
> analyzers that Solr/Lucene provide.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2413) Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

Reply via email to