bruno-roustant commented on a change in pull request #1395: SOLR-14365: 
CollapsingQParser - Avoiding always allocate int[] and float[] with size equals 
to number of unique values (WIP)
URL: https://github.com/apache/lucene-solr/pull/1395#discussion_r402825437
 
 

 ##########
 File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
 ##########
 @@ -524,6 +533,7 @@ public int docID() {
 
     public OrdScoreCollector(int maxDoc,
                              int segments,
+                             PrimitiveMapFactory mapFactory,
 
 Review comment:
   If some preliminary perf tests show that sometime the array version is 
clearly faster, then I agree we should invest in an automated switch logic. But 
in this case the interfaces presented in this PR do not fit this switch logic: 
we should be able to change the underlying implementation of the map 
seamlessly. This is more work.
   
   I would like to understand better the simple benchmark test you did, because 
the reason for the speed/memory difference is still not obvious to me.
   > 95% percent of queries are sparse ones so we only doing collapsing on 1% 
of docs.
   Does that mean that only 1% of the 300K docs are put in the map?
   So on the map implementation side we have an internal capacity of 2^12 (4K) 
or 2^13 (8K) for 2 arrays.
   On the array implementation side we have 1 array of 300K.
   For the memory the map is more efficient. For the speed, it will depend on 
the actual load factor and the map implementation.
   
   > So start with a hash based map with a substantial initial capacity (not 
4!), then upgrade to an array based map impl
   Or maybe the reverse? This depends on the map load factor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to