[
https://issues.apache.org/jira/browse/LUCENE-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554261#comment-13554261
]
Shai Erera commented on LUCENE-4602:
------------------------------------
Found your comments about CategoryListCache
bq. I had separately previously tested the existing int[][][] cache
(CategoryListCache) but it had smaller gains than this (73% for MedTerm), and
it required more RAM (1.9 GB vs 377 RAM for this patch).
That was on 3 ordinals per document, and already consuming a very large piece
of RAM. Also, the gains are not considerable vs the DocValues and PackedBytes
versions that you had. And I assume that testing it with 25 ordinals per
document is going to be even more costly.
In addition, CategoryListCache is not per-segment, so if we want to keep it,
we'd need to do some major rewriting there. I suggest that we just nuke it for
now and come back to it later. The problem with keeping it is that we need to
maintain it, and if it's not that much better, I prefer to nuke it.
> Use DocValues to store per-doc facet ord
> ----------------------------------------
>
> Key: LUCENE-4602
> URL: https://issues.apache.org/jira/browse/LUCENE-4602
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: FacetsPayloadMigrationReader.java, LUCENE-4602.patch,
> LUCENE-4602.patch, LUCENE-4602.patch, LUCENE-4602.patch,
> TestFacetsPayloadMigrationReader.java
>
>
> Spinoff from LUCENE-4600
> DocValues can be used to hold the byte[] encoding all facet ords for
> the document, instead of payloads. I made a hacked up approximation
> of in-RAM DV (see CachedCountingFacetsCollector in the patch) and the
> gains were somewhat surprisingly large:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> HighTerm 0.53 (0.9%) 1.00 (2.5%)
> 87.3% ( 83% - 91%)
> LowTerm 7.59 (0.6%) 26.75 (12.9%)
> 252.6% ( 237% - 267%)
> MedTerm 3.35 (0.7%) 12.71 (9.0%)
> 279.8% ( 268% - 291%)
> {noformat}
> I didn't think payloads were THAT slow; I think it must be the advance
> implementation?
> We need to separately test on-disk DV to make sure it's at least
> on-par with payloads (but hopefully faster) and if so ... we should
> cutover facets to using DV.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]