[ 
https://issues.apache.org/jira/browse/KUDU-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-1516:
-----------------------------------

    Assignee: Will Berkeley

> ksck should check for more raft-related status issues
> -----------------------------------------------------
>
>                 Key: KUDU-1516
>                 URL: https://issues.apache.org/jira/browse/KUDU-1516
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus, ksck, supportability
>    Affects Versions: 0.9.1
>            Reporter: Todd Lipcon
>            Assignee: Will Berkeley
>            Priority: Critical
>
> We currently have a test cluster where one or more tablets have gotten 
> under-replicated (1 replica remaining out of 3) and weren't able to 
> re-replicate in time. 'ksck' still reports that the table is healthy though, 
> and just reports two down tablet servers. It seems there is a lot of room for 
> improvement:
> - for each tablet, check that at least a majority of its replicas are on live 
> tablet servers, and those tablet servers consider the replica to be in 
> RUNNING state
> - some basic tablet "health checks" like asking followers if they have 
> recently successfully heard from leader?
> - perhaps a canary request pushed to each tablet? (eg an empty write or no_op)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to