[
https://issues.apache.org/jira/browse/SOLR-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424739#comment-17424739
]
Michael Kosten commented on SOLR-6910:
--------------------------------------
I submitted the PR. My company has been running a version of this patch for a
few years, starting with 6.5 and currently under 8.8. We have an unusual use
case where the route field value is somewhat volatile, so when indexing we
delete any existing version of the document to avoid duplicates. Under 4.x, we
issued a single delete-by-query for all the documents in the batch. When we
moved to 6.5, our farms began to fall over when indexing. Specifically it was
communication between shard leaders and replicas with the replicas being placed
into recovery. This patch addressed the problem, because we could use
delete-by-ids instead and the replicas stopped failing.
At Eric Pugh's suggestion I created a mini-cluster benchmark test that compare
the two strategies. This indexes 100K documents, but only 1000 unique
documents, and it issues either a deleteByIds or deleteByQuery for all
documents in each batch. It failed many times, I believe because there is still
an issue with delete-by-query with a high volume of indexing when there is more
than a single replica. But here are the results from a successful run:
|*Benchmark*|*(batchSize)*|*(docCount)*|*(nodeCount)*|*(numReplicas)*|*(numShards)*|*(uniqueDocCount)*|*Mode*|*Score*|*Units*|
|DistributedDelete.deleteByIdsAndIndexBatch|100|100000|9|3|3|1000|thrpt|8.562|ops/s|
|DistributedDelete.deleteByQueryAndIndexBatch|100|100000|9|3|3|1000|thrpt|2.045|ops/s|
The PR works by detecting that the necessary route field is missing and
redirecting the DeleteById command to DeleteByQuery (kludgey). I just revised
it to move this a bit earlier in the process. I also tried to instead blast the
DeleteById command to all shard leaders. At one time, I think this may have
worked, because DeleteById with a missing route used to work if it was sent to
the correct shard or happened to hit the correct shard. This seems to have
changed and now DeleteById never works when missing a required route field.
> deleteById without _route_ param for implicit router could be broadcast
> -----------------------------------------------------------------------
>
> Key: SOLR-6910
> URL: https://issues.apache.org/jira/browse/SOLR-6910
> Project: Solr
> Issue Type: Improvement
> Reporter: Ishan Chattopadhyaya
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> A deleteById request without __route__ param for implicit router could be
> sent to all shard leaders (Shalin). Alternatively, a new router (or a strict
> mode for implicit router) that requires _route_ to be set for all
> adds/deletes (Yonik).
> Some discussion here in SOLR-5890:
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14236722&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14236722
> https://issues.apache.org/jira/browse/SOLR-5890?focusedCommentId=14264525&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14264525
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]