[jira] [Comment Edited] (KUDU-2943) TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement

yejiabao_h (Jira) Wed, 24 Nov 2021 06:50:09 -0800


    [ 
https://issues.apache.org/jira/browse/KUDU-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448658#comment-17448658
 ]


yejiabao_h edited comment on KUDU-2943 at 11/24/21, 2:49 PM:
-------------------------------------------------------------

I repeated this problem. In the three replicas, the leader step down is 
executed about 100 times, and it will appear once,

This problem is caused when the old leader does not participate in the voting 
of the new leader after the old leader step down, because the old leader will 
add term plus one when the leader step down, but it will not be updated to the 
disk, and the disk will be updated only when the replica votes in the new 
election.

I think the essence of this problem is that the semantics of the member 
variable cmeta_ of raftconsensus in the leader step down is inconsistent with 
the semantics actually defined, which is defined as a consumption metadata 
persistence object, but in leader step down, becomes a memory object,

Therefore, a solution to this problem is to refresh the leader step down term 
to the disk when it is changed. But I don't understand why in the leader step 
down, term + 1 but not flush to the disk? Can you help explain the reason of 
use SKIP_FLUSH_TO_DISK mode. [~aserbin]  

[https://gerrit.cloudera.org/#/c/11122/]


was (Author: yejiabao_h):
I repeated this problem. In the three replicas, the leader step down is 
executed about 100 times, and it will appear once,

This problem is caused when the old leader does not participate in the voting 
of the new leader after the old leader step down, because the old leader will 
add term plus one when the leader step down, but it will not be updated to the 
disk, and the disk will be updated only when the replica votes in the new 
election.

I think the essence of this problem is that the semantics of the member 
variable cmeta_ of raftconsensus in the leader step down is inconsistent with 
the semantics actually defined, which is defined as a consumption metadata 
persistence object, but in leader step down, becomes a memory object,

Therefore, a solution to this problem is to refresh the leader step down term 
to the disk when it is changed. But I don't understand why in the leader step 
down, term + 1 but not flush to the disk? Can you help explain the reason of 
use SKIP_FLUSH_TO_DISK mode. [~alexey] 

https://gerrit.cloudera.org/#/c/11122/

> TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement
> ----------------------------------------------------------------------------
>
>                 Key: KUDU-2943
>                 URL: https://issues.apache.org/jira/browse/KUDU-2943
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, test
>    Affects Versions: 1.11.0
>            Reporter: Adar Dembo
>            Priority: Critical
>         Attachments: ts_tablet_manager-itest.txt
>
>
> This new test failed in a strange (and worrying) way:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc:753:
>  Failure
> Failed
> Bad status: Corruption: Unable to start RaftConsensus: The last op in the WAL 
> with id 3.4 has a term (3) that is greater than the latest recorded term, 
> which is 2
> {noformat}
> From a brief dig through the code, looks like this means the current term as 
> per the on-disk cmeta file is older than the term in the latest WAL op.
> I can believe that this is somehow due to InternalMiniCluster exercising 
> clean shutdown paths that aren't well tested or robust, but it'd be nice to 
> determine that with certainty.
> I've attached the full test log.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (KUDU-2943) TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement

Reply via email to