[ https://issues.apache.org/jira/browse/KUDU-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-2367: -------------------------------- Resolution: Fixed Fix Version/s: 1.8.0 Status: Resolved (was: In Review) Fixed with fcb0be6381a47155e171eb50a333af502bbf506f > Leader replica sometimes reports follower's health status as FAILED instead > of FAILED_UNRECOVERABLE > --------------------------------------------------------------------------------------------------- > > Key: KUDU-2367 > URL: https://issues.apache.org/jira/browse/KUDU-2367 > Project: Kudu > Issue Type: Bug > Components: tserver > Affects Versions: 1.7.0, 1.8.0 > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > Fix For: 1.8.0 > > > If a leader tablet replica detects that its follower falls behind the WAL > segment GC threshold after the unavailability interval (defined by the > {{--follower_unavailable_considered_failed_sec}} flag), it never reports the > status of the follower as FAILED_UNRECOVERABLE to the catalog manager, and > continues reporting FAILED instead. In configurations where the tablet > replication factor equals to the total number of tablet servers in the > cluster, that leads to situations when the tablet cannot be automatically > recovered for a long time. In particular, such situations last until a new > leader is elected or corresponding tablet servers are restarted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)