CaoManhDat commented on a change in pull request #1395: SOLR-14365: CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values (WIP) URL: https://github.com/apache/lucene-solr/pull/1395#discussion_r402691881
########## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ########## @@ -524,6 +533,7 @@ public int docID() { public OrdScoreCollector(int maxDoc, int segments, + PrimitiveMapFactory mapFactory, Review comment: That is what I thought in the first place and HPPC primitive map did gave a very good result in our performance test on large collection (number of unique values in the collapsing field is 1.2million), around 5x better in qps. But when I write a simple single-thread benchmark test (I can add it here in this PR so you can try to tune it). - Number of docs are 300k - 95% percent of queries are sparse ones so we only doing collapsing on 1% of docs. Array approach actually do better than map (around 1.5x better). The reason for that is - We don't now how much size we gonna need for the map, so we pay lot of cost on resizing the map. - Map needs 2 array for storing key and value seperately, plus with above point it leads to more memory usage. I just don't want to do a commit that can potentially slow down users. Although switching to HPPC primitive map is much easier and makes thing simple for me. Any idea on this @dsmiley @shalinmangar @joel-bernstein ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org