[jira] [Commented] (LUCENE-5752) Explore light weight Automaton replacement

Michael McCandless (JIRA) Wed, 18 Jun 2014 06:47:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035693#comment-14035693
 ]


Michael McCandless commented on LUCENE-5752:
--------------------------------------------

I ran the normal luceneutil bench on wikimediumall:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
        HighSloppyPhrase        3.60      (7.3%)        3.51      (7.4%)   
-2.3% ( -15% -   13%)
              HighPhrase        4.59      (5.5%)        4.54      (6.2%)   
-1.1% ( -12% -   11%)
         MedSloppyPhrase        3.59      (3.8%)        3.55      (4.8%)   
-1.0% (  -9% -    7%)
                HighTerm       63.28      (3.7%)       62.65      (4.3%)   
-1.0% (  -8% -    7%)
                 MedTerm       99.13      (3.0%)       98.15      (3.6%)   
-1.0% (  -7% -    5%)
               MedPhrase      231.08      (5.4%)      229.10      (6.3%)   
-0.9% ( -11% -   11%)
                PKLookup      160.27      (2.4%)      159.32      (2.9%)   
-0.6% (  -5% -    4%)
                 LowTerm      323.40      (1.9%)      321.68      (2.3%)   
-0.5% (  -4% -    3%)
         LowSloppyPhrase       45.04      (1.5%)       44.81      (2.2%)   
-0.5% (  -4% -    3%)
              AndHighLow      413.85      (1.6%)      412.50      (2.4%)   
-0.3% (  -4% -    3%)
             LowSpanNear       11.23      (3.6%)       11.20      (3.2%)   
-0.2% (  -6% -    6%)
            HighSpanNear       10.36      (5.3%)       10.33      (4.7%)   
-0.2% (  -9% -   10%)
             MedSpanNear       34.23      (3.0%)       34.16      (3.0%)   
-0.2% (  -6% -    5%)
             AndHighHigh       28.81      (0.6%)       28.79      (0.8%)   
-0.1% (  -1% -    1%)
               LowPhrase       13.51      (2.2%)       13.50      (1.6%)   
-0.1% (  -3% -    3%)
              AndHighMed       34.92      (0.5%)       34.90      (0.9%)   
-0.1% (  -1% -    1%)
                  IntNRQ        3.45      (6.6%)        3.45      (6.3%)   
-0.1% ( -12% -   13%)
                 Prefix3       93.31      (4.3%)       93.26      (3.5%)   
-0.0% (  -7% -    8%)
                Wildcard       20.15      (4.1%)       20.15      (2.8%)   
-0.0% (  -6% -    7%)
                 Respell       49.36      (3.1%)       49.52      (2.8%)    
0.3% (  -5% -    6%)
           OrNotHighHigh       10.73      (6.3%)       10.81      (6.4%)    
0.8% ( -11% -   14%)
              OrHighHigh        9.89      (6.4%)        9.97      (6.3%)    
0.8% ( -11% -   14%)
               OrHighMed       31.86      (6.3%)       32.12      (6.4%)    
0.8% ( -11% -   14%)
           OrHighNotHigh       13.39      (6.1%)       13.51      (6.1%)    
0.8% ( -10% -   13%)
            OrHighNotMed       36.55      (5.9%)       36.88      (6.1%)    
0.9% ( -10% -   13%)
            OrNotHighMed       23.44      (6.4%)       23.65      (6.6%)    
0.9% ( -11% -   14%)
               OrHighLow       22.74      (6.6%)       22.96      (6.9%)    
1.0% ( -11% -   15%)
            OrNotHighLow       24.38      (6.7%)       24.62      (6.8%)    
1.0% ( -11% -   15%)
            OrHighNotLow       29.97      (6.5%)       30.29      (6.9%)    
1.0% ( -11% -   15%)
                  Fuzzy2       45.50      (3.2%)       46.05      (3.1%)    
1.2% (  -5% -    7%)
                  Fuzzy1       60.83      (4.0%)       61.71      (3.9%)    
1.4% (  -6% -    9%)
{noformat}

Net/net I think this is noise ... but Rob pointed out the Fuzzy1/2
tasks here don't do a prefix, so I'll fixup luceneutil to support that
and test it too.


> Explore light weight Automaton replacement
> ------------------------------------------
>
>                 Key: LUCENE-5752
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5752
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0
>
>         Attachments: LUCENE-5752.patch
>
>
> This effort started with the patch on LUCENE-4556, to create a "light
> weight" replacement for the current object-heavy Automaton class
> (which creates separate State and Transition objects).
> I took that initial patch much further, and cutover most places in
> Lucene that use Automaton to LightAutomaton.  Tests pass.
> The core idea of LightAutomaton is all states are ints, and you build
> up the automaton under the restriction that you add all outgoing
> transitions one state at a time.  This worked well for most
> operations, but for some (e.g. UTF32ToUTF8!!) it was harder, so I also
> added a separate builder to add transitions in any order and then in
> the end they are sorted and added to the real automaton.
> If this is successful I think we should just replace the current
> Automaton with LightAutomaton; right now they both exist in my current
> patch...
> This is very much a work in progress, and I'm not sure the
> restrictions the API imposes are "reasonable" (some algos got uglier).
> But I think it's at least worth exploring/iterating... I'll make a branch and
> commit my current state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5752) Explore light weight Automaton replacement

Reply via email to