Alexey Serbin created KUDU-2800:
-----------------------------------

             Summary: Avoid 'unintended' re-replication of long-bootstrapping 
tablet replicas
                 Key: KUDU-2800
                 URL: https://issues.apache.org/jira/browse/KUDU-2800
             Project: Kudu
          Issue Type: Improvement
          Components: consensus, tserver
    Affects Versions: 1.7.1, 1.8.0, 1.7.0, 1.9.0, 1.9.1, 1.10.0
            Reporter: Alexey Serbin


As implemented in
https://github.com/apache/kudu/blob/10ea0ce5a636a050a1207f7ab5ecf63d178683f5/src/kudu/consensus/consensus_queue.cc#L576
 , the logic for tracking 'health' of tablet replicas cannot differentiate 
between bootstrapping and failed replicas.

As a result, if a tablet replica is bootstrapping for times longer than the 
interval specified by {{--follower_unavailable_considered_failed_sec}} run-time 
flag, the system can start the process of re-replication of the tablet replica 
elsewhere.

One option might be sending a special {{PeerStatus}} for a bootstrapping 
replica with a response to a Raft message sent by a leader replica and updating 
the logic referenced above.  The response might also include additional 
information on the current progress of the bootstrap process.  Probably, we 
need add a separate timeout to track a stale bootstrapping replica, so its 
health would be reported as FAILED after the leader observes the replica being 
stuck in bootstrapping with no forward progress for a time interval longer than 
the timeout specified by the new parameter.

However, the approach above requires the Raft consensus object for a 
bootstrapping replica to be at least partially functional, so it entails 
reading at least some information about a replica from the on-disk consensus 
metadata prior to proper bootstrapping of a tablet replica by a tablet server.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to