[GitHub] [lucene] gautamworah96 commented on a change in pull request #304: LUCENE 10080 Added FixedBitSet for one counts when counting taxonomy facet labels

GitBox Tue, 28 Sep 2021 16:19:16 -0700


gautamworah96 commented on a change in pull request #304:
URL: https://github.com/apache/lucene/pull/304#discussion_r718007188




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java
##########
@@ -43,7 +47,7 @@ protected IntTaxonomyFacets(
     super(indexFieldName, taxoReader, config);
 
     if (useHashTable(fc, taxoReader)) {
-      sparseValues = new IntIntHashMap();
+      sparseValues = new IntIntHashMapWithFixedBitSet(taxoReader.getSize());

Review comment:
       I think we can definitely experiment with the conditions under which 
this logic is triggered. For example, I can imagine cases where there are a lot 
of matches (say > 10% of the index, which triggers the code to use the `values` 
array) and which has a lot of sparse ordinals i.e, ordinals with a count of 
just 0.
   
   In that case we could benefit from an optimization that creates a large 
bitset with most values set to 0 instead of a large `values[int]` array with 
most values set to 0.
   
    Another idea might be to remove both these `hashtable` and `values[]` array 
ideas and keep just the bitset backed by a hashtable idea (this PR). 
   
   Maybe you could have a `wikimedium10m` run with a small print statement that 
records whether the class used a sparse or a dense data-structure (and then get 
the counts from the stdout file).  Based on those results we could then decide 
on where to add this logic. As a side effect, this experiment also verifies 
whether your new code is being used or no :)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] gautamworah96 commented on a change in pull request #304: LUCENE 10080 Added FixedBitSet for one counts when counting taxonomy facet labels

Reply via email to