[
https://issues.apache.org/jira/browse/KUDU-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448658#comment-17448658
]
yejiabao_h edited comment on KUDU-2943 at 11/24/21, 2:49 PM:
-------------------------------------------------------------
I repeated this problem. In the three replicas, the leader step down is
executed about 100 times, and it will appear once,
This problem is caused when the old leader does not participate in the voting
of the new leader after the old leader step down, because the old leader will
add term plus one when the leader step down, but it will not be updated to the
disk, and the disk will be updated only when the replica votes in the new
election.
I think the essence of this problem is that the semantics of the member
variable cmeta_ of raftconsensus in the leader step down is inconsistent with
the semantics actually defined, which is defined as a consumption metadata
persistence object, but in leader step down, becomes a memory object,
Therefore, a solution to this problem is to refresh the leader step down term
to the disk when it is changed. But I don't understand why in the leader step
down, term + 1 but not flush to the disk? Can you help explain the reason of
use SKIP_FLUSH_TO_DISK mode. [~aserbin]
[https://gerrit.cloudera.org/#/c/11122/]
was (Author: yejiabao_h):
I repeated this problem. In the three replicas, the leader step down is
executed about 100 times, and it will appear once,
This problem is caused when the old leader does not participate in the voting
of the new leader after the old leader step down, because the old leader will
add term plus one when the leader step down, but it will not be updated to the
disk, and the disk will be updated only when the replica votes in the new
election.
I think the essence of this problem is that the semantics of the member
variable cmeta_ of raftconsensus in the leader step down is inconsistent with
the semantics actually defined, which is defined as a consumption metadata
persistence object, but in leader step down, becomes a memory object,
Therefore, a solution to this problem is to refresh the leader step down term
to the disk when it is changed. But I don't understand why in the leader step
down, term + 1 but not flush to the disk? Can you help explain the reason of
use SKIP_FLUSH_TO_DISK mode. [~alexey]
https://gerrit.cloudera.org/#/c/11122/
> TsTabletManagerITest.TestTableStats flaky due to WAL/cmeta term disagreement
> ----------------------------------------------------------------------------
>
> Key: KUDU-2943
> URL: https://issues.apache.org/jira/browse/KUDU-2943
> Project: Kudu
> Issue Type: Bug
> Components: consensus, test
> Affects Versions: 1.11.0
> Reporter: Adar Dembo
> Priority: Critical
> Attachments: ts_tablet_manager-itest.txt
>
>
> This new test failed in a strange (and worrying) way:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc:753:
> Failure
> Failed
> Bad status: Corruption: Unable to start RaftConsensus: The last op in the WAL
> with id 3.4 has a term (3) that is greater than the latest recorded term,
> which is 2
> {noformat}
> From a brief dig through the code, looks like this means the current term as
> per the on-disk cmeta file is older than the term in the latest WAL op.
> I can believe that this is somehow due to InternalMiniCluster exercising
> clean shutdown paths that aren't well tested or robust, but it'd be nice to
> determine that with certainty.
> I've attached the full test log.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)