Alexey Serbin created KUDU-2800:
-----------------------------------
Summary: Avoid 'unintended' re-replication of long-bootstrapping
tablet replicas
Key: KUDU-2800
URL: https://issues.apache.org/jira/browse/KUDU-2800
Project: Kudu
Issue Type: Improvement
Components: consensus, tserver
Affects Versions: 1.7.1, 1.8.0, 1.7.0, 1.9.0, 1.9.1, 1.10.0
Reporter: Alexey Serbin
As implemented in
https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
, the logic for tracking 'health' of tablet replicas cannot differentiate
between bootstrapping and failed replicas.
As a result, if a tablet replica is bootstrapping for times longer than the
interval specified by {{--follower_unavailable_considered_failed_sec}} run-time
flag, the system can start the process of re-replication of the tablet replica
elsewhere.
One option might be sending a special {{PeerStatus}} for a bootstrapping
replica with a response to a Raft message sent by a leader replica and updating
the logic referenced above. The response might also include additional
information on the current progress of the bootstrap process. Probably, we
need add a separate timeout to track a stale bootstrapping replica, so its
health would be reported as FAILED after the leader observes the replica being
stuck in bootstrapping with no forward progress for a time interval longer than
the timeout specified by the new parameter.
However, the approach above requires the Raft consensus object for a
bootstrapping replica to be at least partially functional, so it entails
reading at least some information about a replica from the on-disk consensus
metadata prior to proper bootstrapping of a tablet replica by a tablet server.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)