[
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4748:
---------------------------------------
Attachment: LUCENE-4748.patch
New patch, fixing various bugs, beefing up the tests and resolving all
nocommits. I think it's ready!
I also fixed a consistency issue with the facets API: if you request
faceting for a non-existent category, it now returns an empty
FacetResult instead of skipping it.
I tested on a wider variety of drill down / sideways queries. base =
old patch and comp = this patch:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowTermHardDD2 24.43 (2.0%) 24.43 (2.2%)
0.0% ( -4% - 4%)
HighTermEasyDD2 18.91 (1.6%) 20.59 (4.3%)
8.9% ( 2% - 15%)
LowTermHardDD1 31.38 (2.0%) 36.21 (1.7%)
15.4% ( 11% - 19%)
LowTermMixedDD2 44.09 (2.1%) 53.93 (0.9%)
22.3% ( 18% - 25%)
LowTermHardOrDD1 25.85 (2.3%) 33.80 (2.0%)
30.7% ( 25% - 35%)
MedTermHardDD2 5.78 (1.4%) 7.71 (5.3%)
33.4% ( 26% - 40%)
LowTermEasyDD2 129.51 (1.7%) 176.27 (3.9%)
36.1% ( 30% - 42%)
MedTermEasyDD2 42.88 (1.8%) 60.03 (3.5%)
40.0% ( 34% - 46%)
MedTermMixedDD2 12.52 (1.4%) 17.59 (4.2%)
40.5% ( 34% - 46%)
LowTermHardOrDD2 18.57 (2.8%) 26.45 (1.3%)
42.4% ( 37% - 47%)
LowTermEasyDD1 71.73 (1.8%) 102.77 (1.8%)
43.3% ( 38% - 47%)
LowTermEasyOrDD2 61.01 (2.7%) 98.57 (6.7%)
61.6% ( 50% - 73%)
HighTermHardDD2 1.22 (1.8%) 1.97 (6.8%)
61.7% ( 52% - 71%)
MedTermHardDD1 8.77 (2.6%) 14.47 (5.1%)
65.1% ( 55% - 74%)
HighTermMixedDD2 2.69 (1.6%) 4.50 (6.8%)
67.4% ( 58% - 76%)
MedTermEasyDD1 18.61 (2.6%) 32.34 (6.1%)
73.8% ( 63% - 84%)
LowTermEasyOrDD1 51.31 (2.2%) 91.48 (2.1%)
78.3% ( 72% - 84%)
HighTermEasyOrDD2 8.96 (3.1%) 16.17 (5.4%)
80.5% ( 69% - 91%)
HighTermEasyOrDD1 3.47 (4.1%) 6.40 (7.5%)
84.8% ( 70% - 100%)
MedTermHardOrDD2 4.31 (3.3%) 8.03 (6.4%)
86.6% ( 74% - 99%)
HighTermEasyDD1 3.16 (3.0%) 5.89 (7.7%)
86.6% ( 73% - 100%)
MedTermEasyOrDD1 15.63 (3.4%) 30.05 (6.5%)
92.2% ( 79% - 105%)
HighTermHardDD1 1.61 (3.1%) 3.13 (7.6%)
94.3% ( 81% - 108%)
MedTermHardOrDD1 6.75 (3.5%) 13.76 (6.0%)
103.9% ( 91% - 117%)
HighTermHardOrDD2 1.14 (4.2%) 2.41 (9.2%)
111.6% ( 94% - 130%)
MedTermEasyOrDD2 19.92 (3.0%) 45.44 (6.3%)
128.1% ( 115% - 141%)
HighTermHardOrDD1 0.96 (3.5%) 2.54 (10.4%)
163.6% ( 144% - 183%)
{noformat}
DD2 means drill down on 2 dims, DD1 means drill down on 1 dim. Hard
means the 1 or 2 dims have high count, Easy means they have low count,
and Mixed means one high and one low. OrDDX means I OR two values per
dim.
The new patch is especially faster for the OR case (ie, when you drill
down on more than one value in a single dim), I think because it
handles it directly instead of recursing into another BQ.
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
> Key: LUCENE-4748
> URL: https://issues.apache.org/jira/browse/LUCENE-4748
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch,
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch,
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in. Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]