[ https://issues.apache.org/jira/browse/KUDU-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935313#comment-16935313 ]
HeLifu edited comment on KUDU-2943 at 9/23/19 8:51 AM: ------------------------------------------------------- If we step down a leader tablet, the leader's term will be increased by 1 but not persisted. https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L570 Then, after a successful election, one of the followers will be the new leader and the term will be increased by 1 too. The term is durable for the new leader, but not for the old one. This is the root cause. https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L1138 So, the StepDown API is not safe. // code placeholder tablet: ac74b319ad54416685f8b9d9506e1d61 f42c56 c2c8be eea10e | | | | start election | | WON | | leader(1,0) | (1,0) | (1,0) | NO_OP(1,1) | (1,1) | (1,1) | Write some Rows(1,2) | (1,2) | (1,2) | **StepDown(1/2,2)[term 2 is not durable] | start election(1/2,2) | | | | start election(1/2,2) WON | | FAIL leader(2,2)[term 2 is durable] | | (2,2)[term 2 is durable] NO_OP(2,3) | | (2,2)[not receive NO_OP] **StepDown(2/3,3)[term 3 is not durable]"Line 570" | | start election(2/3,2) | WON | leader(3,2)[term 3 is durable] | | | NO_OP(3,3) (3,3)[term 3 is not durable]"Line 1138" | | alter schema(3,4) (3,4)[term 3 is not durable] | | | | [restart masters] | [restart tservers] | **Reboot tablet failed since term is 2 in consensus metadata, opid is (3,4) in WAL was (Author: helifu): I think the term 3 for f42c56 is not durable. That means the StepDown API is not safe. {code:java} // code placeholder tablet: ac74b319ad54416685f8b9d9506e1d61 f42c56 c2c8be eea10e | | | | start election | | | | (1,0) leader(1,0) (1,0) | | | (1,1) NO_OP(1,1) (1,1) | | | (1,2) Write some Rows(1,2) (1,2) | | | | StepDown(2, 2) | start election(2,2) | | | | start election(2,2) WIN | | FAIL leader(2,2)[term is durable] | | | NO_OP(2,3)[no sync] (2,2)[not receive NO_OP] | | ****StepDown(3,3)[term is not durable]"Line 1489" | | start election(3,2) | | | WIN | | | leader(3,2) | | | NO_OP(3,3) | | | alter schema(3,4) (3,4)[term is not durable, op in WAL] | | | restart masters restart tservers {code} > TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement > ---------------------------------------------------------------------------- > > Key: KUDU-2943 > URL: https://issues.apache.org/jira/browse/KUDU-2943 > Project: Kudu > Issue Type: Bug > Components: consensus, test > Affects Versions: 1.11.0 > Reporter: Adar Dembo > Priority: Critical > Attachments: ts_tablet_manager-itest.txt > > > This new test failed in a strange (and worrying) way: > {noformat} > /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc:753: > Failure > Failed > Bad status: Corruption: Unable to start RaftConsensus: The last op in the WAL > with id 3.4 has a term (3) that is greater than the latest recorded term, > which is 2 > {noformat} > From a brief dig through the code, looks like this means the current term as > per the on-disk cmeta file is older than the term in the latest WAL op. > I can believe that this is somehow due to InternalMiniCluster exercising > clean shutdown paths that aren't well tested or robust, but it'd be nice to > determine that with certainty. > I've attached the full test log. -- This message was sent by Atlassian Jira (v8.3.4#803005)