[ 
https://issues.apache.org/jira/browse/KUDU-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937037#comment-15937037
 ] 

Mike Percy commented on KUDU-1860:
----------------------------------

Even worse than this but related, we saw a case in the field where if there is 
no leader, it's possible that ksck does not report that (it will report the 
tablet as healthy) because the leader is cached in the master's data.

> ksck doesn't identify tablets that are evicted but still in config
> ------------------------------------------------------------------
>
>                 Key: KUDU-1860
>                 URL: https://issues.apache.org/jira/browse/KUDU-1860
>             Project: Kudu
>          Issue Type: Bug
>          Components: ksck, ops-tooling
>    Affects Versions: 1.2.0
>            Reporter: Jean-Daniel Cryans
>
> As reported by a user on Slack, ksck can give you a wrong output such as:
> {noformat}
>   ca199fafca544df2a1b2a01be9d5266d (server1:7250): RUNNING [LEADER]
>   a077957f627c4758ab5a989aca8a1ca8 (server2:7250): RUNNING
>   5c09a555c205482b8131f15b2c249ec6 (server3:7250): bad state
>     State:       NOT_STARTED
>     Data state:  TABLET_DATA_TOMBSTONED
>     Last status: Tablet initializing...
> {noformat}
> The problem is that server2 was already evicted out of the configuration 
> (based on reading the logs) but it wasn't committed in the config (which 
> contains server 1 and 3) since there's really only 1 server left out of 3.
> Ideally ksck should try to see what each server thinks the configuration is 
> and see if there's a difference from what's in the master. As it is, it looks 
> like we're missing 1 replica but in reality this is a broken tablet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to