bruno-roustant commented on a change in pull request #1395: SOLR-14365: CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values (WIP) URL: https://github.com/apache/lucene-solr/pull/1395#discussion_r402825437
########## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ########## @@ -524,6 +533,7 @@ public int docID() { public OrdScoreCollector(int maxDoc, int segments, + PrimitiveMapFactory mapFactory, Review comment: If some preliminary perf tests show that sometime the array version is clearly faster, then I agree we should invest in an automated switch logic. But in this case the interfaces presented in this PR do not fit this switch logic: we should be able to change the underlying implementation of the map seamlessly. This is more work. I would like to understand better the simple benchmark test you did, because the reason for the speed/memory difference is still not obvious to me. > 95% percent of queries are sparse ones so we only doing collapsing on 1% of docs. Does that mean that only 1% of the 300K docs are put in the map? So on the map implementation side we have an internal capacity of 2^12 (4K) or 2^13 (8K) for 2 arrays. On the array implementation side we have 1 array of 300K. For the memory the map is more efficient. For the speed, it will depend on the actual load factor and the map implementation. > So start with a hash based map with a substantial initial capacity (not 4!), then upgrade to an array based map impl Or maybe the reverse? This depends on the map load factor. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org