Varun Thacker created SOLR-9978:
-----------------------------------
Summary: Reduce collapse query memory usage
Key: SOLR-9978
URL: https://issues.apache.org/jira/browse/SOLR-9978
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Varun Thacker
Assignee: Varun Thacker
- Single shard test with one replica
- 10M documents and 9M of those documents are unique. Test was for string
- Collapse query parser creates two arrays :
- int array for unique documents ( 9M in this case )
- float array for the corresponding scores ( 9M in this case )
- It goes through all documents and puts the document in the array if the score
is better than the previously existing score.
- So collapse creates a lot of garbage when the total number of documents is
high and the duplicates is very less
- Even for a query like this {{q={!cache=false}*:*&fq={!collapse
field=collapseField_s cache=false}&sort=id desc}}
which has a top level sort , the collapse query parser creates the score
array and scores every document
Indexing script used to generate dummy data:
{code}
//Index 10M documents , with every 1/10 document as a duplicate.
List<SolrInputDocument> docs = new ArrayList<>(1000);
for(int i=0; i<1000*1000*10; i++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", i);
if (i%10 ==0 && i!=0) {
doc.addField("collapseField_s", i-1);
} else {
doc.addField("collapseField_s", i);
}
docs.add(doc);
if (docs.size() == 1000) {
client.add("ct", docs);
docs.clear();
}
}
client.commit("ct");
{code}
Query:
{{q={!cache=false}*:*&fq={!collapse field=collapseField_s cache=false}&sort=id
desc}}
Improvements
- We currently default to the SCORE implementation if no min|max|sort param is
provided in the collapse query. Check if a global sort is provided and don't
score documents picking the first occurrence of each unique value.
- Instead of creating an array for unique documents use a bitset
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]