Jean-Daniel Cryans created KUDU-1860:
----------------------------------------
Summary: ksck doesn't identify tablets that are evicted but still
in config
Key: KUDU-1860
URL: https://issues.apache.org/jira/browse/KUDU-1860
Project: Kudu
Issue Type: Bug
Components: util
Affects Versions: 1.2.0
Reporter: Jean-Daniel Cryans
Priority: Critical
As reported by a user on Slack, ksck can give you a wrong output such as:
{noformat}
ca199fafca544df2a1b2a01be9d5266d (server1:7250): RUNNING [LEADER]
a077957f627c4758ab5a989aca8a1ca8 (server2:7250): RUNNING
5c09a555c205482b8131f15b2c249ec6 (server3:7250): bad state
State: NOT_STARTED
Data state: TABLET_DATA_TOMBSTONED
Last status: Tablet initializing...
{noformat}
The problem is that server2 was already evicted out of the configuration (based
on reading the logs) but it wasn't committed in the config (which contains
server 1 and 3) since there's really only 1 server left out of 3.
Ideally ksck should try to see what each server thinks the configuration is and
see if there's a difference from what's in the master. As it is, it looks like
we're missing 1 replica but in reality this is a broken tablet.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)