[ 
https://issues.apache.org/jira/browse/SOLR-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424739#comment-17424739
 ] 

Michael Kosten commented on SOLR-6910:
--------------------------------------

I submitted the PR. My company has been running a version of this patch for a 
few years, starting with 6.5 and currently under 8.8. We have an unusual use 
case where the route field value is somewhat volatile, so when indexing we 
delete any existing version of the document to avoid duplicates. Under 4.x, we 
issued a single delete-by-query for all the documents in the batch. When we 
moved to 6.5, our farms began to fall over when indexing. Specifically it was 
communication between shard leaders and replicas with the replicas being placed 
into recovery. This patch addressed the problem, because we could use 
delete-by-ids instead and the replicas stopped failing.

At Eric Pugh's suggestion I created a mini-cluster benchmark test that compare 
the two strategies. This indexes 100K documents, but only 1000 unique 
documents, and it issues either a deleteByIds or deleteByQuery for all 
documents in each batch. It failed many times, I believe because there is still 
an issue with delete-by-query with a high volume of indexing when there is more 
than a single replica. But here are the results from a successful run:


|*Benchmark*|*(batchSize)*|*(docCount)*|*(nodeCount)*|*(numReplicas)*|*(numShards)*|*(uniqueDocCount)*|*Mode*|*Score*|*Units*|
|DistributedDelete.deleteByIdsAndIndexBatch|100|100000|9|3|3|1000|thrpt|8.562|ops/s|
|DistributedDelete.deleteByQueryAndIndexBatch|100|100000|9|3|3|1000|thrpt|2.045|ops/s|


The PR works by detecting that the necessary route field is missing and 
redirecting the DeleteById command to DeleteByQuery (kludgey). I just revised 
it to move this a bit earlier in the process. I also tried to instead blast the 
DeleteById command to all shard leaders. At one time, I think this may have 
worked, because DeleteById with a missing route used to work if it was sent to 
the correct shard or happened to hit the correct shard. This seems to have 
changed and now DeleteById never works when missing a required route field.

> deleteById without _route_ param for implicit router could be broadcast
> -----------------------------------------------------------------------
>
>                 Key: SOLR-6910
>                 URL: https://issues.apache.org/jira/browse/SOLR-6910
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> A deleteById request without __route__ param for implicit router could be 
> sent to all shard leaders (Shalin). Alternatively, a new router (or a strict 
> mode for implicit router) that requires _route_ to be set for all 
> adds/deletes (Yonik).
> Some discussion here in SOLR-5890:
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14236722&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236722
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14264525&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14264525



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to