[
https://issues.apache.org/jira/browse/LUCENE-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422976#comment-17422976
]
Marc D'Mello commented on LUCENE-10080:
---------------------------------------
Hi [~gsmiller], I looked into it some more and I think it would be a good idea
to start with IntTaxonomyFacets and StringValueFacetCounts as those would be
the easiest to implement. I also think it might be useful to improve faceting
benchmarks to better test my change. I know you opened an issue about that but
I opened an issue that was more specific to this change
[here|https://github.com/mikemccand/luceneutil/issues/141].
> Use a bit set to count long-tail of singleton FacetLabels?
> ----------------------------------------------------------
>
> Key: LUCENE-10080
> URL: https://issues.apache.org/jira/browse/LUCENE-10080
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Priority: Major
>
> I was talking about this with [~rcmuir ] about LUCENE-9969, and he had a neat
> idea for more efficient facet counting.
> Today we accumulate counts directly in an HPPC native int/int map, or a
> non-sparse {{int[]}} (if enough hits match the query).
> But it is likely that many of these facet counts are singletons (occur only
> once in each query). To be more space efficient, we could wrap a bit set
> around the map or {{int[]}}. The first time we see an ordinal, we set its
> bit. The second and subsequent times, we increment the count as we do today.
> If we use a non-sparse bitset (e.g. {{FixedBitSet}}) that will add some
> non-sparse heap cost O(maxDoc) for each segment, but if there are enough
> ordinals to count, that can be a win over just the HPPC native int map for
> some cases?
> Maybe this could be an intermediate implementation, since we already cover
> the "very low hit count" (use HPPC int/int map) and "very high hit count"
> (using {{int[]}}) today?
> Also, this bit set would be able to quickly iterate over the sorted ordinals,
> which might be helpful if we move the three big {{int[]}} into numeric doc
> values?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]