Alexey Serbin created KUDU-2367:
-----------------------------------
Summary: Leader replica sometimes reports follower's health status
as FAILED instead of FAILED_UNRECOVERABLE
Key: KUDU-2367
URL: https://issues.apache.org/jira/browse/KUDU-2367
Project: Kudu
Issue Type: Bug
Components: tserver
Affects Versions: 1.7.0, 1.8.0
Reporter: Alexey Serbin
Assignee: Alexey Serbin
If a leader tablet replica detects that its follower falls behind the WAL
segment GC threshold after the unavailability interval (defined by the
{{--follower_unavailable_considered_failed_sec}} flag), it never reports the
status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and
continues reporting FAILED instead. In configurations where the tablet
replication factor equals to the total number of tablet servers in the cluster,
that leads to situations when the tablet cannot be automatically recovered for
a long time. In particular, such situations last until a new leader is elected
or corresponding tablet servers are restarted.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)