[
https://issues.apache.org/jira/browse/SOLR-11459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya reassigned SOLR-11459:
-------------------------------------------
Assignee: Ishan Chattopadhyaya
> AddUpdateCommand#prevVersion is not cleared which may lead to problem for
> in-place updates of non existed documents
> -------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-11459
> URL: https://issues.apache.org/jira/browse/SOLR-11459
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 7.0
> Reporter: Andrey Kudryavtsev
> Assignee: Ishan Chattopadhyaya
> Priority: Minor
>
> I have a 1_shard / *m*_replicas SolrCloud cluster with Solr 6.6.0 and run
> batches of 5 - 10k in-place updates from time to time.
> Once I noticed that job "hangs" - it started and couldn't finish for a a
> while.
> Logs were full of messages like:
> {code} Missing update, on which current in-place update depends on, hasn't
> arrived. id=__, looking for version=___, last found version=0" {code}
> {code}
> Tried to fetch document ___ from the leader, but the leader says document has
> been deleted. Deleting the document here and skipping this update: Last found
> version: 0, was looking for: ___",24,0,"but the leader says document has been
> deleted. Deleting the document here and skipping this update: Last found
> version: 0
> {code}
> Further analysis shows that:
> * There are 100-500 updates for non-existed documents among other updates
> (something that I have to deal with)
> * Leader receives bunch of updates and executes this updates one by one.
> {{JavabinLoader}} which is used by processing documents reuses same instance
> of {{AddUpdateCommand}} for every update and just [clearing its state at the
> end|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99].
> Field [AddUpdateCommand#prevVersion|
> https://github.com/apache/lucene-solr/blob/6396cb759f8c799f381b0730636fa412761030ce/solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java#L76]
> is not cleared.
> * In case of update is in-place update, but specified document does not
> exist, this update is processed as a regular atomic update (i.e. new doc is
> created), but {{prevVersion}} is used as a {{distrib.inplace.prevversion}}
> parameter in sequential calls to every slave in DistributedUpdateProcessor.
> {{prevVersion}} wasn't cleared, so it may contain version from previous
> processed update.
> * Slaves checks it's own version of documents which is 0 (cause doc does not
> exist), slave thinks that some updates were missed and spends 5 seconds in
> [DistributedUpdateProcessor#waitForDependentUpdates|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99]
> waiting for missed updates (no luck) and also tries to get "correct" version
> from leader (no luck as well)
> * So update for non existed document costs *m* * 5 sec each
> I workarounded this by explicit check of doc existence, but it probably
> should be fixed.
> Obviously first guess is that prevVersion should be cleared in
> {{AddUpdateCommand#clear}}, but have no clue how to test it.
> {code}
> +++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java
> (revision )
> @@ -78,6 +78,7 @@
> updateTerm = null;
> isLastDocInBatch = false;
> version = 0;
> + prevVersion = -1;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]