[
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-4602:
---------------------------------------
Attachment: LUCENE-4602.patch
Patch with another prototype DV-backed collector. This one only works
for single-valued fields (in "dimension" terminology, I think: the
document can have multiple dimensions as long as each dimension has
only one category path under it).
It stores only a single ord per doc X
field into a PackedLongDocValuesField. The collector then aggregates
per-segment, during collection, but only the leaf ords, and then at
the end, it walks up the hierarchy, summing counts to the parent ords.
It's the fastest impl so far:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
HighTerm 0.71 (0.5%) 1.60 (2.3%)
124.1% ( 120% - 127%)
MedTerm 4.43 (0.5%) 31.39 (6.6%)
609.1% ( 599% - 618%)
LowTerm 10.28 (0.4%) 75.37 (3.8%)
633.0% ( 626% - 639%)
{noformat}
But it's somewhat hacked up (storing DV field directly myself)... Shai
explained that it's possible now to have facets store only the leaf
ord, so once we get DV cleanly integrated we should try that and
re-rest.
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
> Key: LUCENE-4602
> URL: https://issues.apache.org/jira/browse/LUCENE-4602
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Attachments: LUCENE-4602.patch
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads. I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> HighTerm 0.53 (0.9%) 1.00 (2.5%)
> 87.3% ( 83% - 91%)
> LowTerm 7.59 (0.6%) 26.75 (12.9%)
> 252.6% ( 237% - 267%)
> MedTerm 3.35 (0.7%) 12.71 (9.0%)
> 279.8% ( 268% - 291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]