[
https://issues.apache.org/jira/browse/SOLR-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller reassigned SOLR-10889:
----------------------------------
Assignee: Mark Miller
> Stale zookeper information is used during failover check
> --------------------------------------------------------
>
> Key: SOLR-10889
> URL: https://issues.apache.org/jira/browse/SOLR-10889
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: master (7.0)
> Reporter: Mihaly Toth
> Assignee: Mark Miller
> Attachments: SOLR-10889.patch
>
>
> In {{OverseerAutoReplicaFailoverThread}} it goes over each and every replica
> to check if it needs to be reloaded on a new node. In each such round it
> reads cluster state just in the beginning. Especially in case of big
> clusters, cluster state may change during the process of iterating through
> the replicas. As a result false decisions may be made: restarting a healthy
> core, or not handling a bad node.
> The code fragment in question:
> {code}
> for (Slice slice : slices) {
> if (slice.getState() == Slice.State.ACTIVE) {
> final Collection<DownReplica> downReplicas = new
> ArrayList<DownReplica>();
> int goodReplicas = findDownReplicasInSlice(clusterState,
> docCollection, slice, downReplicas);
> {code}
> The solution seems rather straightforward, reading the state every time:
> {code}
> int goodReplicas =
> findDownReplicasInSlice(zkStateReader.getClusterState(), docCollection,
> slice, downReplicas);
> {code}
> The only counter argument that comes into my mind is too frequent reading of
> the cluster state. We can enhance this naive solution so that re-reading is
> done only if a bad node is found. But I am not sure if such a read
> optimization is necessary.
> I have done some unit tests around this class, mocking out even the time
> factor. It runs in a second. I am interested in getting feedback about such
> an approach. I will upload a patch with this shortly.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]