gautamworah96 commented on a change in pull request #304:
URL: https://github.com/apache/lucene/pull/304#discussion_r718007188
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##########
@@ -43,7 +47,7 @@ protected IntTaxonomyFacets(
super(indexFieldName, taxoReader, config);
if (useHashTable(fc, taxoReader)) {
- sparseValues = new IntIntHashMap();
+ sparseValues = new IntIntHashMapWithFixedBitSet(taxoReader.getSize());
Review comment:
I think we can definitely experiment with the conditions under which
this logic is triggered. For example, I can imagine cases where there are a lot
of matches (say > 10% of the index, which triggers the code to use the `values`
array) and which has a lot of sparse ordinals i.e, ordinals with a count of
just 0.
In that case we could benefit from an optimization that creates a large
bitset with most values set to 0 instead of a large `values[int]` array with
most values set to 0.
Another idea might be to remove both these `hashtable` and `values[]` array
ideas and keep just the bitset backed by a hashtable idea (this PR).
Maybe you could have a `wikimedium10m` run with a small print statement that
records whether the class used a sparse or a dense data-structure (and then get
the counts). Based on those results we could then decide on where to add this
logic
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##########
@@ -253,4 +257,92 @@ public FacetResult getTopChildren(int topN, String dim,
String... path) throws I
return new FacetResult(dim, path, totValue, labelValues, childCount);
}
+
+ /**
+ * Class that uses FixedBitSet to store counts for all ordinals with 1 count
and IntIntHashMap for
+ * all other counts
+ */
+ protected static class IntIntHashMapWithFixedBitSet implements
Iterable<IntIntCursor> {
Review comment:
+1 there is a similar class called `FloatTaxonomyFacets` but it does not
currently use this sparse counting mechanism. If we see some performance gains
from this approach, we could definitively extend it to that class as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]