[
https://issues.apache.org/jira/browse/SOLR-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Khludnev updated SOLR-12291:
------------------------------------
Summary: Async prematurely reports completed state that causes severe shard
loss (was: OverseerCollectionMessageHandler sliceCmd assumes only one replica
exists on each node)
> Async prematurely reports completed state that causes severe shard loss
> -----------------------------------------------------------------------
>
> Key: SOLR-12291
> URL: https://issues.apache.org/jira/browse/SOLR-12291
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Backup/Restore, SolrCloud
> Reporter: Varun Thacker
> Priority: Major
> Attachments: SOLR-12291.patch, SOLR-12291.patch, SOLR-122911.patch
>
>
> The OverseerCollectionMessageHandler sliceCmd assumes only one replica exists
> on one node
> When multiple replicas of a slice are on the same node we only track one
> replica's async request. This happens because the async requestMap's key is
> "node_name"
> I discovered this when [~alabax] shared some logs of a restore issue, where
> the second replica got added before the first replica had completed it's
> restorecore action.
> While looking at the logs I noticed that the overseer never called
> REQUESTSTATUS for the restorecore action , almost as if it had missed
> tracking that particular async request.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]