[
https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4748:
---------------------------------------
Attachment: LUCENE-4748.patch
I made a new patch, with a custom scorer to find the exact & near-miss
hits and tally accordingly. I also added a random test which seems to
be passing ... I think the new scorer is working (but there are still
tons of nocommits).
It improves performance vs the last patch (base = last patch, comp =
new patch), on full wikibig (6.6M docs), 7 dims. Each TermQuery does
a drill down on Date/2012 and imageCount/1:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowTerm 33.53 (1.2%) 26.27 (4.2%)
-21.7% ( -26% - -16%)
MedTerm 14.20 (0.9%) 16.29 (4.8%)
14.7% ( 8% - 20%)
HighTerm 6.47 (1.2%) 9.43 (4.8%)
45.7% ( 39% - 52%)
{noformat}
I think LowTerm got slower because the new scorer has highish init
cost: it works like BS1, allocating arrays[CHUNK] up front.
For comparison ... this is the same set of queries, but doing
only drill-down. base and comp are the same here (so the diffs are
noise), so you have to abs compare to the table above to get the
drill-sideways penalty (~2 - 2.4 X slower):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedTerm 32.83 (1.0%) 32.94 (0.6%)
0.3% ( -1% - 2%)
HighTerm 22.26 (0.8%) 22.37 (0.5%)
0.5% ( 0% - 1%)
LowTerm 58.47 (1.4%) 58.91 (0.9%)
0.8% ( -1% - 3%)
{noformat}
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>
> Key: LUCENE-4748
> URL: https://issues.apache.org/jira/browse/LUCENE-4748
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch,
> LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch,
> LUCENE-4748.patch, LUCENE-4748.patch
>
>
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in. Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]