[GitHub] [lucene-solr] CaoManhDat commented on a change in pull request #1395: SOLR-14365: CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values (WIP)

GitBox Thu, 02 Apr 2020 18:47:18 -0700

CaoManhDat commented on a change in pull request #1395: SOLR-14365: 
CollapsingQParser - Avoiding always allocate int[] and float[] with size equals 
to number of unique values (WIP)
URL: https://github.com/apache/lucene-solr/pull/1395#discussion_r402691881


 ##########
 File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
 ##########
 @@ -524,6 +533,7 @@ public int docID() {
 
     public OrdScoreCollector(int maxDoc,
                              int segments,
+                             PrimitiveMapFactory mapFactory,
 
 Review comment:
   That is what I thought in the first place and HPPC primitive map did gave a 
very good result in our performance test on large collection (number of unique 
values in the collapsing field is 1.2million), around 5x better in qps.
   
   But when I write a simple single-thread benchmark test (I can add it here in 
this PR so you can try to tune it). 
   - Number of docs are 300k
   - 95% percent of queries are sparse ones so we only doing collapsing on 1% 
of docs.
   Array approach actually do better than map (around 1.5x better). The reason 
for that is
   - We don't now how much size we gonna need for the map, so we pay lot of 
cost on resizing the map.
   - Map needs 2 array for storing key and value seperately, plus with above 
point it leads to more memory usage.
   
   I just don't want to do a commit that can potentially slow down users. 
Although switching to HPPC primitive map is much easier and makes thing simple 
for me.
   Any idea on this @dsmiley @shalinmangar @joel-bernstein 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] CaoManhDat commented on a change in pull request #1395: SOLR-14365: CollapsingQParser - Avoiding always allocate int[] and float[] with size equals to number of unique values (WIP)

Reply via email to