Todd Lipcon has submitted this change and it was merged. Change subject: Don't crash TS if consensus metadata is corrupted ......................................................................
Don't crash TS if consensus metadata is corrupted If the consensus metadata somehow gets corrupted with a too-early term, the TS should not crash with a CHECK failure. Instead, it should just mark that tablet as FAILED. Currently, the leader does not auto-evict a FAILED replica. But, the administrator can use the CLI tools to delete the bad replica, which should cause it to get automatically repaired. This fix is based on an issue encountered in Bruce Song Zhang's cluster. His cluster had been affected by KUDU-1436, which caused tablets on many servers to have incorrect consensus metadata. Because of the CHECK that was in place, he was unable to restart and recover those servers, causing an outage. With this patch in place, only the affected tablets would have been affected, and assuming a majority of replicas were still available, the table availability would not have been compromised. Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c Reviewed-on: http://gerrit.cloudera.org:8080/3006 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins --- M src/kudu/consensus/raft_consensus_state.cc M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/external_mini_cluster_fs_inspector.h M src/kudu/integration-tests/raft_consensus-itest.cc 4 files changed, 76 insertions(+), 10 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3006 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If9f85c1ce31a32e89e57c74e9750e66073b9752c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: song bruce zhang <[email protected]>
