[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558819#comment-13558819
]
Michael McCandless edited comment on LUCENE-4600 at 1/21/13 3:06 PM:
---------------------------------------------------------------------
base = ALL_PARENTS, comp = NO_PARENTS:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedSpanNear 125.77 (2.0%) 79.31 (0.8%)
-36.9% ( -38% - -34%)
LowSpanNear 124.86 (2.7%) 79.23 (0.5%)
-36.5% ( -38% - -34%)
HighSpanNear 124.23 (2.3%) 79.44 (0.8%)
-36.1% ( -38% - -33%)
AndHighLow 107.24 (1.4%) 72.70 (0.7%)
-32.2% ( -33% - -30%)
MedPhrase 55.98 (0.6%) 44.89 (1.4%)
-19.8% ( -21% - -17%)
AndHighMed 52.06 (0.7%) 43.20 (0.0%)
-17.0% ( -17% - -16%)
Fuzzy2 35.71 (0.6%) 30.42 (1.6%)
-14.8% ( -16% - -12%)
LowPhrase 17.27 (0.3%) 15.21 (3.2%)
-11.9% ( -15% - -8%)
HighPhrase 15.20 (6.2%) 13.50 (4.7%)
-11.2% ( -20% - 0%)
LowTerm 41.68 (0.4%) 37.49 (0.4%)
-10.1% ( -10% - -9%)
LowSloppyPhrase 17.31 (2.9%) 15.75 (0.9%)
-9.0% ( -12% - -5%)
Fuzzy1 28.11 (0.3%) 25.63 (0.0%)
-8.8% ( -9% - -8%)
MedSloppyPhrase 18.42 (1.5%) 17.25 (0.1%)
-6.3% ( -7% - -4%)
Respell 56.32 (0.3%) 54.41 (2.2%)
-3.4% ( -5% - 0%)
HighSloppyPhrase 0.83 (6.8%) 0.81 (1.0%)
-2.3% ( -9% - 5%)
Wildcard 63.43 (1.9%) 61.96 (0.3%)
-2.3% ( -4% - 0%)
Prefix3 45.60 (0.5%) 45.70 (0.7%)
0.2% ( -1% - 1%)
IntNRQ 17.54 (0.6%) 17.60 (1.4%)
0.3% ( -1% - 2%)
PKLookup 205.89 (0.5%) 210.73 (0.7%)
2.4% ( 1% - 3%)
AndHighHigh 11.89 (0.2%) 12.48 (0.3%)
5.0% ( 4% - 5%)
HighTerm 7.00 (0.2%) 8.09 (0.1%)
15.6% ( 15% - 16%)
OrHighHigh 3.77 (0.6%) 4.36 (0.3%)
15.6% ( 14% - 16%)
OrHighLow 6.65 (0.1%) 7.69 (1.5%)
15.6% ( 14% - 17%)
OrHighMed 6.61 (0.4%) 7.66 (0.2%)
15.8% ( 15% - 16%)
MedTerm 18.86 (0.4%) 22.13 (0.4%)
17.3% ( 16% - 18%)
{noformat}
I think because this test has 2.5M ords ... the cost of "rolling up" in the end
is non-trivial ...
was (Author: mikemccand):
base = ALL_PARENTS, comp = NO_PARENTS:
{noformat}
All facet dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedSpanNear 125.77 (2.0%) 79.31 (0.8%)
-36.9% ( -38% - -34%)
LowSpanNear 124.86 (2.7%) 79.23 (0.5%)
-36.5% ( -38% - -34%)
HighSpanNear 124.23 (2.3%) 79.44 (0.8%)
-36.1% ( -38% - -33%)
AndHighLow 107.24 (1.4%) 72.70 (0.7%)
-32.2% ( -33% - -30%)
MedPhrase 55.98 (0.6%) 44.89 (1.4%)
-19.8% ( -21% - -17%)
AndHighMed 52.06 (0.7%) 43.20 (0.0%)
-17.0% ( -17% - -16%)
Fuzzy2 35.71 (0.6%) 30.42 (1.6%)
-14.8% ( -16% - -12%)
LowPhrase 17.27 (0.3%) 15.21 (3.2%)
-11.9% ( -15% - -8%)
HighPhrase 15.20 (6.2%) 13.50 (4.7%)
-11.2% ( -20% - 0%)
LowTerm 41.68 (0.4%) 37.49 (0.4%)
-10.1% ( -10% - -9%)
LowSloppyPhrase 17.31 (2.9%) 15.75 (0.9%)
-9.0% ( -12% - -5%)
Fuzzy1 28.11 (0.3%) 25.63 (0.0%)
-8.8% ( -9% - -8%)
MedSloppyPhrase 18.42 (1.5%) 17.25 (0.1%)
-6.3% ( -7% - -4%)
Respell 56.32 (0.3%) 54.41 (2.2%)
-3.4% ( -5% - 0%)
HighSloppyPhrase 0.83 (6.8%) 0.81 (1.0%)
-2.3% ( -9% - 5%)
Wildcard 63.43 (1.9%) 61.96 (0.3%)
-2.3% ( -4% - 0%)
Prefix3 45.60 (0.5%) 45.70 (0.7%)
0.2% ( -1% - 1%)
IntNRQ 17.54 (0.6%) 17.60 (1.4%)
0.3% ( -1% - 2%)
PKLookup 205.89 (0.5%) 210.73 (0.7%)
2.4% ( 1% - 3%)
AndHighHigh 11.89 (0.2%) 12.48 (0.3%)
5.0% ( 4% - 5%)
HighTerm 7.00 (0.2%) 8.09 (0.1%)
15.6% ( 15% - 16%)
OrHighHigh 3.77 (0.6%) 4.36 (0.3%)
15.6% ( 14% - 16%)
OrHighLow 6.65 (0.1%) 7.69 (1.5%)
15.6% ( 14% - 17%)
OrHighMed 6.61 (0.4%) 7.66 (0.2%)
15.8% ( 15% - 16%)
MedTerm 18.86 (0.4%) 22.13 (0.4%)
17.3% ( 16% - 18%)
{noformat}
I think because this test has 2.5M ords ... the cost of "rolling up" in the end
is non-trivial ...
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Shai Erera
> Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]