Alexey Serbin created KUDU-2367:
-----------------------------------

             Summary: Leader replica sometimes reports follower's health status 
as FAILED instead of FAILED_UNRECOVERABLE
                 Key: KUDU-2367
                 URL: https://issues.apache.org/jira/browse/KUDU-2367
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.7.0, 1.8.0
            Reporter: Alexey Serbin
            Assignee: Alexey Serbin


If a leader tablet replica detects that its follower falls behind the WAL 
segment GC threshold after the unavailability interval (defined by the 
{{--follower_unavailable_considered_failed_sec}} flag), it never reports the 
status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and 
continues reporting FAILED instead.  In configurations where the tablet 
replication factor equals to the total number of tablet servers in the cluster, 
that leads to situations when the tablet cannot be automatically recovered for 
a long time.  In particular, such situations last until a new leader is elected 
or corresponding tablet servers are restarted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to