[ 
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4748:
---------------------------------------

    Attachment: LUCENE-4748.patch

I made a new patch, with a custom scorer to find the exact & near-miss
hits and tally accordingly.  I also added a random test which seems to
be passing ... I think the new scorer is working (but there are still
tons of nocommits).

It improves performance vs the last patch (base = last patch, comp =
new patch), on full wikibig (6.6M docs), 7 dims.  Each TermQuery does
a drill down on Date/2012 and imageCount/1:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 LowTerm       33.53      (1.2%)       26.27      (4.2%)  
-21.7% ( -26% -  -16%)
                 MedTerm       14.20      (0.9%)       16.29      (4.8%)   
14.7% (   8% -   20%)
                HighTerm        6.47      (1.2%)        9.43      (4.8%)   
45.7% (  39% -   52%)
{noformat}

I think LowTerm got slower because the new scorer has highish init
cost: it works like BS1, allocating arrays[CHUNK] up front.

For comparison ... this is the same set of queries, but doing
only drill-down.  base and comp are the same here (so the diffs are
noise), so you have to abs compare to the table above to get the
drill-sideways penalty (~2 - 2.4 X slower):

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 MedTerm       32.83      (1.0%)       32.94      (0.6%)    
0.3% (  -1% -    2%)
                HighTerm       22.26      (0.8%)       22.37      (0.5%)    
0.5% (   0% -    1%)
                 LowTerm       58.47      (1.4%)       58.91      (0.9%)    
0.8% (  -1% -    3%)
{noformat}

                
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
>                 Key: LUCENE-4748
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4748
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, 
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, 
> LUCENE-4748.patch, LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in.  Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to