[
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4764:
---------------------------------------
Attachment: LUCENE-4764.patch
Initial dirty patch (lots of nocommits still):
I added a FacetDocValuesFormat, which goes back to the
more-RAM-consuming-but-faster-for-facets 4.0 format, and also hacked
the FastCountingFacetsAggregator to directly decode from the full
byte[], saving overhead of method-call and filling a BytesRef. It
gets faster results than default (Lucene42) DVFormat:
This is wikibig all 6.6M, 7 facet dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowTerm 110.44 (2.0%) 104.86 (1.0%)
-5.1% ( -7% - -2%)
Fuzzy1 46.50 (2.6%) 44.83 (1.3%)
-3.6% ( -7% - 0%)
MedSpanNear 28.61 (2.9%) 27.91 (1.8%)
-2.5% ( -6% - 2%)
Respell 45.56 (4.0%) 44.71 (3.1%)
-1.9% ( -8% - 5%)
Fuzzy2 52.44 (3.6%) 51.69 (2.2%)
-1.4% ( -6% - 4%)
LowPhrase 21.30 (6.3%) 21.01 (6.0%)
-1.4% ( -12% - 11%)
LowSpanNear 8.37 (2.4%) 8.26 (3.3%)
-1.3% ( -6% - 4%)
MedSloppyPhrase 25.88 (2.4%) 25.73 (2.3%)
-0.6% ( -5% - 4%)
AndHighMed 105.02 (1.4%) 105.78 (1.0%)
0.7% ( -1% - 3%)
LowSloppyPhrase 20.32 (3.2%) 20.55 (3.5%)
1.1% ( -5% - 8%)
HighSpanNear 3.51 (2.4%) 3.56 (1.7%)
1.2% ( -2% - 5%)
HighPhrase 17.32 (10.1%) 17.56 (10.2%)
1.4% ( -17% - 24%)
AndHighLow 575.37 (3.9%) 583.69 (3.7%)
1.4% ( -5% - 9%)
HighSloppyPhrase 0.92 (6.2%) 0.95 (6.8%)
2.4% ( -9% - 16%)
AndHighHigh 23.25 (1.4%) 24.54 (0.9%)
5.5% ( 3% - 7%)
MedPhrase 110.00 (5.3%) 117.78 (6.1%)
7.1% ( -4% - 19%)
Wildcard 27.31 (2.1%) 32.28 (1.6%)
18.2% ( 14% - 22%)
MedTerm 46.99 (2.7%) 57.33 (1.8%)
22.0% ( 17% - 27%)
OrHighMed 16.38 (3.6%) 21.44 (3.2%)
30.9% ( 23% - 39%)
OrHighHigh 8.63 (3.7%) 11.33 (3.6%)
31.3% ( 23% - 39%)
OrHighLow 16.88 (3.5%) 22.21 (3.3%)
31.6% ( 23% - 39%)
Prefix3 12.91 (2.9%) 17.29 (2.0%)
33.9% ( 28% - 39%)
HighTerm 18.99 (2.8%) 25.99 (2.5%)
36.9% ( 30% - 43%)
IntNRQ 3.54 (3.2%) 4.96 (2.2%)
40.0% ( 33% - 46%)
{noformat}
But it's also more Disk/RAM-consuming: trunk facet DVs take 61.2 MB
while the patch takes 80.3 MB (31% more).
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
> Key: LUCENE-4764
> URL: https://issues.apache.org/jira/browse/LUCENE-4764
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]