[ 
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4748:
---------------------------------------

    Attachment: LUCENE-4748.patch

New patch, fixing various bugs, beefing up the tests and resolving all
nocommits.  I think it's ready!

I also fixed a consistency issue with the facets API: if you request
faceting for a non-existent category, it now returns an empty
FacetResult instead of skipping it.

I tested on a wider variety of drill down / sideways queries.  base =
old patch and comp = this patch:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
          LowTermHardDD2       24.43      (2.0%)       24.43      (2.2%)    
0.0% (  -4% -    4%)
         HighTermEasyDD2       18.91      (1.6%)       20.59      (4.3%)    
8.9% (   2% -   15%)
          LowTermHardDD1       31.38      (2.0%)       36.21      (1.7%)   
15.4% (  11% -   19%)
         LowTermMixedDD2       44.09      (2.1%)       53.93      (0.9%)   
22.3% (  18% -   25%)
        LowTermHardOrDD1       25.85      (2.3%)       33.80      (2.0%)   
30.7% (  25% -   35%)
          MedTermHardDD2        5.78      (1.4%)        7.71      (5.3%)   
33.4% (  26% -   40%)
          LowTermEasyDD2      129.51      (1.7%)      176.27      (3.9%)   
36.1% (  30% -   42%)
          MedTermEasyDD2       42.88      (1.8%)       60.03      (3.5%)   
40.0% (  34% -   46%)
         MedTermMixedDD2       12.52      (1.4%)       17.59      (4.2%)   
40.5% (  34% -   46%)
        LowTermHardOrDD2       18.57      (2.8%)       26.45      (1.3%)   
42.4% (  37% -   47%)
          LowTermEasyDD1       71.73      (1.8%)      102.77      (1.8%)   
43.3% (  38% -   47%)
        LowTermEasyOrDD2       61.01      (2.7%)       98.57      (6.7%)   
61.6% (  50% -   73%)
         HighTermHardDD2        1.22      (1.8%)        1.97      (6.8%)   
61.7% (  52% -   71%)
          MedTermHardDD1        8.77      (2.6%)       14.47      (5.1%)   
65.1% (  55% -   74%)
        HighTermMixedDD2        2.69      (1.6%)        4.50      (6.8%)   
67.4% (  58% -   76%)
          MedTermEasyDD1       18.61      (2.6%)       32.34      (6.1%)   
73.8% (  63% -   84%)
        LowTermEasyOrDD1       51.31      (2.2%)       91.48      (2.1%)   
78.3% (  72% -   84%)
       HighTermEasyOrDD2        8.96      (3.1%)       16.17      (5.4%)   
80.5% (  69% -   91%)
       HighTermEasyOrDD1        3.47      (4.1%)        6.40      (7.5%)   
84.8% (  70% -  100%)
        MedTermHardOrDD2        4.31      (3.3%)        8.03      (6.4%)   
86.6% (  74% -   99%)
         HighTermEasyDD1        3.16      (3.0%)        5.89      (7.7%)   
86.6% (  73% -  100%)
        MedTermEasyOrDD1       15.63      (3.4%)       30.05      (6.5%)   
92.2% (  79% -  105%)
         HighTermHardDD1        1.61      (3.1%)        3.13      (7.6%)   
94.3% (  81% -  108%)
        MedTermHardOrDD1        6.75      (3.5%)       13.76      (6.0%)  
103.9% (  91% -  117%)
       HighTermHardOrDD2        1.14      (4.2%)        2.41      (9.2%)  
111.6% (  94% -  130%)
        MedTermEasyOrDD2       19.92      (3.0%)       45.44      (6.3%)  
128.1% ( 115% -  141%)
       HighTermHardOrDD1        0.96      (3.5%)        2.54     (10.4%)  
163.6% ( 144% -  183%)
{noformat}

DD2 means drill down on 2 dims, DD1 means drill down on 1 dim.  Hard
means the 1 or 2 dims have high count, Easy means they have low count,
and Mixed means one high and one low.  OrDDX means I OR two values per
dim.

The new patch is especially faster for the OR case (ie, when you drill
down on more than one value in a single dim), I think because it
handles it directly instead of recursing into another BQ.

                
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
>                 Key: LUCENE-4748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4748
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, 
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, 
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in.  Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to