Todd Lipcon has submitted this change and it was merged.

Change subject: Don't crash TS if consensus metadata is corrupted
......................................................................


Don't crash TS if consensus metadata is corrupted

If the consensus metadata somehow gets corrupted with a too-early term, the TS
should not crash with a CHECK failure. Instead, it should just mark that tablet
as FAILED.

Currently, the leader does not auto-evict a FAILED replica. But, the 
administrator
can use the CLI tools to delete the bad replica, which should cause it to get
automatically repaired.

This fix is based on an issue encountered in Bruce Song Zhang's cluster. His
cluster had been affected by KUDU-1436, which caused tablets on many servers to
have incorrect consensus metadata. Because of the CHECK that was in place, he
was unable to restart and recover those servers, causing an outage. With this
patch in place, only the affected tablets would have been affected, and
assuming a majority of replicas were still available, the table availability
would not have been compromised.

Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c
Reviewed-on: http://gerrit.cloudera.org:8080/3006
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/raft_consensus_state.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc
M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h
M src/kudu/integration-tests/raft_consensus-itest.cc
4 files changed, 76 insertions(+), 10 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3006
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: song bruce zhang <[email protected]>

Reply via email to