[
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554163#comment-13554163
]
Shai Erera commented on LUCENE-4602:
------------------------------------
Indeed nice results, though they are still far from the uber-specialized
Collectors you wrote :).
I think that we should see some more gains after we fix LUCENE-4620 (I'm going
to do that tomorrow!).
Also, I want to write a special DGapVIntEncoder/Decoder - that should hopefully
somewhat improve decoding time too. I opened LUCENE-4686.
And I'm not sure if you used PackedInts to encode/decode the ordinals in your
previous patches. If you were, then maybe LUCENE-4609 will bring some more
improvements.
Overall, I think this is good progress. With this patch, facets are now on
DocValues, supporting all features. There are more optimizations /
specializations to do - we should do them separately.
Mike, I remember in one of our chats we discussed the effectiveness of
CategoryListCache. I seem to remember you said it had high RAM consumption, and
also it didn't perform well. Do you perhaps have these results? I wonder if you
can compare this patch (with in-mem DV) to a CategoryListCache CLI - if the
results are not good, we should just nuke it.
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
> Key: LUCENE-4602
> URL: https://issues.apache.org/jira/browse/LUCENE-4602
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: FacetsPayloadMigrationReader.java, LUCENE-4602.patch,
> LUCENE-4602.patch, LUCENE-4602.patch, LUCENE-4602.patch,
> TestFacetsPayloadMigrationReader.java
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads. I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> HighTerm 0.53 (0.9%) 1.00 (2.5%)
> 87.3% ( 83% - 91%)
> LowTerm 7.59 (0.6%) 26.75 (12.9%)
> 252.6% ( 237% - 267%)
> MedTerm 3.35 (0.7%) 12.71 (9.0%)
> 279.8% ( 268% - 291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]