[ https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4748: --------------------------------------- Attachment: LUCENE-4748.patch New patch, fixing various bugs, beefing up the tests and resolving all nocommits. I think it's ready! I also fixed a consistency issue with the facets API: if you request faceting for a non-existent category, it now returns an empty FacetResult instead of skipping it. I tested on a wider variety of drill down / sideways queries. base = old patch and comp = this patch: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff LowTermHardDD2 24.43 (2.0%) 24.43 (2.2%) 0.0% ( -4% - 4%) HighTermEasyDD2 18.91 (1.6%) 20.59 (4.3%) 8.9% ( 2% - 15%) LowTermHardDD1 31.38 (2.0%) 36.21 (1.7%) 15.4% ( 11% - 19%) LowTermMixedDD2 44.09 (2.1%) 53.93 (0.9%) 22.3% ( 18% - 25%) LowTermHardOrDD1 25.85 (2.3%) 33.80 (2.0%) 30.7% ( 25% - 35%) MedTermHardDD2 5.78 (1.4%) 7.71 (5.3%) 33.4% ( 26% - 40%) LowTermEasyDD2 129.51 (1.7%) 176.27 (3.9%) 36.1% ( 30% - 42%) MedTermEasyDD2 42.88 (1.8%) 60.03 (3.5%) 40.0% ( 34% - 46%) MedTermMixedDD2 12.52 (1.4%) 17.59 (4.2%) 40.5% ( 34% - 46%) LowTermHardOrDD2 18.57 (2.8%) 26.45 (1.3%) 42.4% ( 37% - 47%) LowTermEasyDD1 71.73 (1.8%) 102.77 (1.8%) 43.3% ( 38% - 47%) LowTermEasyOrDD2 61.01 (2.7%) 98.57 (6.7%) 61.6% ( 50% - 73%) HighTermHardDD2 1.22 (1.8%) 1.97 (6.8%) 61.7% ( 52% - 71%) MedTermHardDD1 8.77 (2.6%) 14.47 (5.1%) 65.1% ( 55% - 74%) HighTermMixedDD2 2.69 (1.6%) 4.50 (6.8%) 67.4% ( 58% - 76%) MedTermEasyDD1 18.61 (2.6%) 32.34 (6.1%) 73.8% ( 63% - 84%) LowTermEasyOrDD1 51.31 (2.2%) 91.48 (2.1%) 78.3% ( 72% - 84%) HighTermEasyOrDD2 8.96 (3.1%) 16.17 (5.4%) 80.5% ( 69% - 91%) HighTermEasyOrDD1 3.47 (4.1%) 6.40 (7.5%) 84.8% ( 70% - 100%) MedTermHardOrDD2 4.31 (3.3%) 8.03 (6.4%) 86.6% ( 74% - 99%) HighTermEasyDD1 3.16 (3.0%) 5.89 (7.7%) 86.6% ( 73% - 100%) MedTermEasyOrDD1 15.63 (3.4%) 30.05 (6.5%) 92.2% ( 79% - 105%) HighTermHardDD1 1.61 (3.1%) 3.13 (7.6%) 94.3% ( 81% - 108%) MedTermHardOrDD1 6.75 (3.5%) 13.76 (6.0%) 103.9% ( 91% - 117%) HighTermHardOrDD2 1.14 (4.2%) 2.41 (9.2%) 111.6% ( 94% - 130%) MedTermEasyOrDD2 19.92 (3.0%) 45.44 (6.3%) 128.1% ( 115% - 141%) HighTermHardOrDD1 0.96 (3.5%) 2.54 (10.4%) 163.6% ( 144% - 183%) {noformat} DD2 means drill down on 2 dims, DD1 means drill down on 1 dim. Hard means the 1 or 2 dims have high count, Easy means they have low count, and Mixed means one high and one low. OrDDX means I OR two values per dim. The new patch is especially faster for the OR case (ie, when you drill down on more than one value in a single dim), I think because it handles it directly instead of recursing into another BQ. > Add DrillSideways helper class to Lucene facets module > ------------------------------------------------------ > > Key: LUCENE-4748 > URL: https://issues.apache.org/jira/browse/LUCENE-4748 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.2, 5.0 > > Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, > LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, > LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch > > > This came out of a discussion on the java-user list with subject > "Faceted search in OR": http://markmail.org/thread/jmnq6z2x7ayzci5k > The basic idea is to count "near misses" during collection, ie > documents that matched the main query and also all except one of the > drill down filters. > Drill sideways makes for a very nice faceted search UI because you > don't "lose" the facet counts after drilling in. Eg maybe you do a > search for "cameras", and you see facets for the manufacturer, so you > drill into "Nikon". > With drill sideways, even after drilling down, you'll still get the > counts for all the other brands, where each count tells you how many > hits you'd get if you changed to a different manufacturer. > This becomes more fun if you add further drill-downs, eg maybe I next drill > down into Resolution=10 megapixels", and then I can see how many 10 > megapixel cameras all other manufacturers, and what other resolutions > Nikon cameras offer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org