[ 
https://issues.apache.org/jira/browse/KUDU-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935313#comment-16935313
 ] 

HeLifu edited comment on KUDU-2943 at 9/23/19 8:51 AM:
-------------------------------------------------------

If we step down a leader tablet, the leader's term will be increased by 1 but 
not persisted.
https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L570
Then, after a successful election, one of the followers will be the new leader 
and the term will be increased by 1 too.
The term is durable for the new leader, but not for the old one. This is the 
root cause.
https://github.com/apache/kudu/blob/ee22ddcc734ab4947218c670d5cfddd61fe90fbb/src/kudu/consensus/raft_consensus.cc#L1138

So, the StepDown API is not safe.

// code placeholder
tablet: ac74b319ad54416685f8b9d9506e1d61
 f42c56 c2c8be eea10e
 | | |
 | start election |
 | WON |
 | leader(1,0) |
 (1,0) | (1,0)
 | NO_OP(1,1) |
 (1,1) | (1,1)
 | Write some Rows(1,2) |
 (1,2) | (1,2)
 | **StepDown(1/2,2)[term 2 is not durable] |
 start election(1/2,2) | |
 | | start election(1/2,2)
 WON |
 | FAIL
 leader(2,2)[term 2 is durable] |
 | (2,2)[term 2 is durable]
 NO_OP(2,3) |
 | (2,2)[not receive NO_OP]
**StepDown(2/3,3)[term 3 is not durable]"Line 570" |
 | start election(2/3,2)
 | WON
 | leader(3,2)[term 3 is durable]
 | |
 | NO_OP(3,3)
 (3,3)[term 3 is not durable]"Line 1138" |
 | alter schema(3,4)
 (3,4)[term 3 is not durable] |
 | |
 | [restart masters]
 | [restart tservers]
 |
**Reboot tablet failed since term is 2 in consensus metadata, opid is (3,4) in 
WAL


was (Author: helifu):
I think the term 3 for f42c56 is not durable. That means the StepDown API is 
not safe.
{code:java}
// code placeholder
tablet: ac74b319ad54416685f8b9d9506e1d61
          f42c56             c2c8be              eea10e
             |                  |                   |
             |            start election            |
             |                  |                   |
           (1,0)          leader(1,0)             (1,0)
             |                  |                   |
           (1,1)           NO_OP(1,1)             (1,1)
             |                  |                   |
           (1,2)       Write some Rows(1,2)       (1,2)
             |                  |                   |
             |            StepDown(2, 2)            |
     start election(2,2)        |                   |
             |                  |           start election(2,2)
            WIN                                     |
             |                                     FAIL
       leader(2,2)[term is durable]                 |
             |                                      |
        NO_OP(2,3)[no sync]                     (2,2)[not receive NO_OP]
             |                                      |
****StepDown(3,3)[term is not durable]"Line 1489"   |
             |                              start election(3,2)
             |                                      |
             |                                     WIN
             |                                      |
             |                                  leader(3,2)
             |                                      |
             |                                   NO_OP(3,3)
             |                                      |
             |                              alter schema(3,4)
           (3,4)[term is not durable, op in WAL]    |
             |                                      |
                      restart masters
                      restart tservers
{code}
 

> TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement
> ----------------------------------------------------------------------------
>
>                 Key: KUDU-2943
>                 URL: https://issues.apache.org/jira/browse/KUDU-2943
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, test
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Priority: Critical
>         Attachments: ts_tablet_manager-itest.txt
>
>
> This new test failed in a strange (and worrying) way:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc:753:
>  Failure
> Failed
> Bad status: Corruption: Unable to start RaftConsensus: The last op in the WAL 
> with id 3.4 has a term (3) that is greater than the latest recorded term, 
> which is 2
> {noformat}
> From a brief dig through the code, looks like this means the current term as 
> per the on-disk cmeta file is older than the term in the latest WAL op.
> I can believe that this is somehow due to InternalMiniCluster exercising 
> clean shutdown paths that aren't well tested or robust, but it'd be nice to 
> determine that with certainty.
> I've attached the full test log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to