[jira] [Comment Edited] (LUCENE-4600) Explore facets aggregation during documents collection

Michael McCandless (JIRA) Mon, 21 Jan 2013 07:08:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558819#comment-13558819
 ]


Michael McCandless edited comment on LUCENE-4600 at 1/21/13 3:06 PM:
---------------------------------------------------------------------

base = ALL_PARENTS, comp = NO_PARENTS:
{noformat}

                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
             MedSpanNear      125.77      (2.0%)       79.31      (0.8%)  
-36.9% ( -38% -  -34%)
             LowSpanNear      124.86      (2.7%)       79.23      (0.5%)  
-36.5% ( -38% -  -34%)
            HighSpanNear      124.23      (2.3%)       79.44      (0.8%)  
-36.1% ( -38% -  -33%)
              AndHighLow      107.24      (1.4%)       72.70      (0.7%)  
-32.2% ( -33% -  -30%)
               MedPhrase       55.98      (0.6%)       44.89      (1.4%)  
-19.8% ( -21% -  -17%)
              AndHighMed       52.06      (0.7%)       43.20      (0.0%)  
-17.0% ( -17% -  -16%)
                  Fuzzy2       35.71      (0.6%)       30.42      (1.6%)  
-14.8% ( -16% -  -12%)
               LowPhrase       17.27      (0.3%)       15.21      (3.2%)  
-11.9% ( -15% -   -8%)
              HighPhrase       15.20      (6.2%)       13.50      (4.7%)  
-11.2% ( -20% -    0%)
                 LowTerm       41.68      (0.4%)       37.49      (0.4%)  
-10.1% ( -10% -   -9%)
         LowSloppyPhrase       17.31      (2.9%)       15.75      (0.9%)   
-9.0% ( -12% -   -5%)
                  Fuzzy1       28.11      (0.3%)       25.63      (0.0%)   
-8.8% (  -9% -   -8%)
         MedSloppyPhrase       18.42      (1.5%)       17.25      (0.1%)   
-6.3% (  -7% -   -4%)
                 Respell       56.32      (0.3%)       54.41      (2.2%)   
-3.4% (  -5% -    0%)
        HighSloppyPhrase        0.83      (6.8%)        0.81      (1.0%)   
-2.3% (  -9% -    5%)
                Wildcard       63.43      (1.9%)       61.96      (0.3%)   
-2.3% (  -4% -    0%)
                 Prefix3       45.60      (0.5%)       45.70      (0.7%)    
0.2% (  -1% -    1%)
                  IntNRQ       17.54      (0.6%)       17.60      (1.4%)    
0.3% (  -1% -    2%)
                PKLookup      205.89      (0.5%)      210.73      (0.7%)    
2.4% (   1% -    3%)
             AndHighHigh       11.89      (0.2%)       12.48      (0.3%)    
5.0% (   4% -    5%)
                HighTerm        7.00      (0.2%)        8.09      (0.1%)   
15.6% (  15% -   16%)
              OrHighHigh        3.77      (0.6%)        4.36      (0.3%)   
15.6% (  14% -   16%)
               OrHighLow        6.65      (0.1%)        7.69      (1.5%)   
15.6% (  14% -   17%)
               OrHighMed        6.61      (0.4%)        7.66      (0.2%)   
15.8% (  15% -   16%)
                 MedTerm       18.86      (0.4%)       22.13      (0.4%)   
17.3% (  16% -   18%)
{noformat}

I think because this test has 2.5M ords ... the cost of "rolling up" in the end 
is non-trivial ...
                
      was (Author: mikemccand):
    base = ALL_PARENTS, comp = NO_PARENTS:
{noformat}
All facet dims:
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
             MedSpanNear      125.77      (2.0%)       79.31      (0.8%)  
-36.9% ( -38% -  -34%)
             LowSpanNear      124.86      (2.7%)       79.23      (0.5%)  
-36.5% ( -38% -  -34%)
            HighSpanNear      124.23      (2.3%)       79.44      (0.8%)  
-36.1% ( -38% -  -33%)
              AndHighLow      107.24      (1.4%)       72.70      (0.7%)  
-32.2% ( -33% -  -30%)
               MedPhrase       55.98      (0.6%)       44.89      (1.4%)  
-19.8% ( -21% -  -17%)
              AndHighMed       52.06      (0.7%)       43.20      (0.0%)  
-17.0% ( -17% -  -16%)
                  Fuzzy2       35.71      (0.6%)       30.42      (1.6%)  
-14.8% ( -16% -  -12%)
               LowPhrase       17.27      (0.3%)       15.21      (3.2%)  
-11.9% ( -15% -   -8%)
              HighPhrase       15.20      (6.2%)       13.50      (4.7%)  
-11.2% ( -20% -    0%)
                 LowTerm       41.68      (0.4%)       37.49      (0.4%)  
-10.1% ( -10% -   -9%)
         LowSloppyPhrase       17.31      (2.9%)       15.75      (0.9%)   
-9.0% ( -12% -   -5%)
                  Fuzzy1       28.11      (0.3%)       25.63      (0.0%)   
-8.8% (  -9% -   -8%)
         MedSloppyPhrase       18.42      (1.5%)       17.25      (0.1%)   
-6.3% (  -7% -   -4%)
                 Respell       56.32      (0.3%)       54.41      (2.2%)   
-3.4% (  -5% -    0%)
        HighSloppyPhrase        0.83      (6.8%)        0.81      (1.0%)   
-2.3% (  -9% -    5%)
                Wildcard       63.43      (1.9%)       61.96      (0.3%)   
-2.3% (  -4% -    0%)
                 Prefix3       45.60      (0.5%)       45.70      (0.7%)    
0.2% (  -1% -    1%)
                  IntNRQ       17.54      (0.6%)       17.60      (1.4%)    
0.3% (  -1% -    2%)
                PKLookup      205.89      (0.5%)      210.73      (0.7%)    
2.4% (   1% -    3%)
             AndHighHigh       11.89      (0.2%)       12.48      (0.3%)    
5.0% (   4% -    5%)
                HighTerm        7.00      (0.2%)        8.09      (0.1%)   
15.6% (  15% -   16%)
              OrHighHigh        3.77      (0.6%)        4.36      (0.3%)   
15.6% (  14% -   16%)
               OrHighLow        6.65      (0.1%)        7.69      (1.5%)   
15.6% (  14% -   17%)
               OrHighMed        6.61      (0.4%)        7.66      (0.2%)   
15.8% (  15% -   16%)
                 MedTerm       18.86      (0.4%)       22.13      (0.4%)   
17.3% (  16% -   18%)
{noformat}

I think because this test has 2.5M ords ... the cost of "rolling up" in the end 
is non-trivial ...
                  
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to