[
https://issues.apache.org/jira/browse/KUDU-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304439#comment-17304439
]
Bankim Bhavsar commented on KUDU-3266:
--------------------------------------
A=eb86b1c4913647fc927d576f744c3d27
B=3978845170a24b80a8903036a6e97382
C=00e14f42918b444bb2a05e9b4f2ac855
Log snippets:
{noformat}
I0317 17:04:14.694448 17546 raft_consensus.cc:479] T
00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855 [term 4
FOLLOWER]: Starting leader election (detected failure of leader
eb86b1c4913647fc927d576f744c3d27
I0317 17:04:18.775841 17549 sys_catalog.cc:437] T
00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382
[sys.catalog]: This master's current role is: LEADER
I0317 17:04:18.776212 17553 sys_catalog.cc:437] T
00000000000000000000000000000000 P 00e14f42918b444bb2a05e9b4f2ac855
[sys.catalog]: This master's current role is: FOLLOWER
{noformat}
eb86b1c4913647fc927d576f744c3d27 coming back from pause thinks it's the leader.
While 00e14f42918b444bb2a05e9b4f2ac855 is trying to become leader.
{noformat}
I0317 17:04:19.643288 14988 tablet_service.cc:1729] Received
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000"
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader:
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true
I0317 17:04:19.644940 17503 consensus_queue.cc:571] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [LEADER]:
Leader has been unable to successfully communicate with peer
3978845170a24b80a8903036a6e97382 for more than 4 seconds (6.459s)
I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000"
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader:
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"
I0317 17:04:19.645114 17503 sys_catalog.cc:434] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27
[sys.catalog]: SysCatalogTable state changed. Reason: Peer health change.
Latest consensus state: current_term: 4 leader_uuid:
"eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852
OBSOLETE_local: false peers { permanent_uuid:
"eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host:
"127.0.92.253" port: 37459 } } peers { permanent_uuid:
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host:
"127.0.92.252" port: 45331 } } peers { permanent_uuid:
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host:
"127.0.92.254" port: 43853 } attrs { promote: false } } }
I0317 17:04:19.645200 17503 sys_catalog.cc:437] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27
[sys.catalog]: This master's current role is: LEADER
I0317 17:04:19.645022 14987 tablet_service.cc:1729] Received
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000"
candidate_uuid: "00e14f42918b444bb2a05e9b4f2ac855" candidate_term: 5
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader:
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"
I0317 17:04:19.645114 17503 sys_catalog.cc:434] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27
[sys.catalog]: SysCatalogTable state changed. Reason: Peer health change.
Latest consensus state: current_term: 4 leader_uuid:
"eb86b1c4913647fc927d576f744c3d27" committed_config { opid_index: 2852
OBSOLETE_local: false peers { permanent_uuid:
"eb86b1c4913647fc927d576f744c3d27" member_type: VOTER last_known_addr { host:
"127.0.92.253" port: 37459 } } peers { permanent_uuid:
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host:
"127.0.92.252" port: 45331 } } peers { permanent_uuid:
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host:
"127.0.92.254" port: 43853 } attrs { promote: false } } }
/data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119:
Failure
Failed
Bad status: Not found: Unable to open table: the table does not exist:
table_name: "table-1"
/data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603:
Failure
Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't
generate new fatal failures in the current thread.
Actual: it does.
I0317 17:04:19.667603 371 external_mini_cluster.cc:1294] Killing
/tmp/dist-test-task6JYMlq/build/debug/bin/kudu with pid 15089
I0317 17:04:19.673735 15061 raft_consensus.cc:1223] T
00000000000000000000000000000000 P 3978845170a24b80a8903036a6e97382 [term 6
LEADER]: Rejecting Update request from peer eb86b1c4913647fc927d576f744c3d27
for earlier term 4. Current term is 6. Ops: []
I0317 17:04:19.676132 14988 tablet_service.cc:1729] Received
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000"
candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader:
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27"
I0317 17:04:19.676213 14987 tablet_service.cc:1729] Received
RequestConsensusVote() RPC: tablet_id: "00000000000000000000000000000000"
candidate_uuid: "3978845170a24b80a8903036a6e97382" candidate_term: 6
candidate_status { last_received { term: 4 index: 3513 } } ignore_live_leader:
false dest_uuid: "eb86b1c4913647fc927d576f744c3d27" is_pre_election: true
I0317 17:04:19.676512 14986 raft_consensus.cc:3027] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4
LEADER]: Stepping down as leader of term 4
I0317 17:04:19.676553 14986 raft_consensus.cc:726] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27 [term 4
LEADER]: Becoming Follower/Learner. State: Replica:
eb86b1c4913647fc927d576f744c3d27, State: Running, Role: LEADER
I0317 17:04:19.676688 14986 consensus_queue.cc:257] T
00000000000000000000000000000000 P eb86b1c4913647fc927d576f744c3d27
[NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated index: 0,
Majority replicated index: 3513, Committed index: 3513, Last appended: 4.3513,
Last appended by leader: 3513, Current term: 4, Majority size: -1, State: 0,
Mode: NON_LEADER, active raft config: opid_index: 2852 OBSOLETE_local: false
peers { permanent_uuid: "eb86b1c4913647fc927d576f744c3d27" member_type: VOTER
last_known_addr { host: "127.0.92.253" port: 37459 } } peers { permanent_uuid:
"3978845170a24b80a8903036a6e97382" member_type: VOTER last_known_addr { host:
"127.0.92.252" port: 45331 } } peers { permanent_uuid:
"00e14f42918b444bb2a05e9b4f2ac855" member_type: VOTER last_known_addr { host:
"127.0.92.254" port: 43853 } attrs { promote: false } }
{noformat}
> Flakiness in dynamic_multi_master_test in VerifyClusterAfterMasterAddition()
> function
> -------------------------------------------------------------------------------------
>
> Key: KUDU-3266
> URL: https://issues.apache.org/jira/browse/KUDU-3266
> Project: Kudu
> Issue Type: Test
> Components: master, test
> Affects Versions: 1.15.0
> Reporter: Bankim Bhavsar
> Assignee: Bankim Bhavsar
> Priority: Major
>
> {noformat}
> ParameterizedRecoverMasterTest.TestRecoverDeadMasterSysCatalogCopy/1:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/cluster_verifier.cc:119:
> Failure
> Failed
> Bad status: Not found: Unable to open table: the table does not exist:
> table_name: "table-1"
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:603:
> Failure
> Expected: cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0) doesn't
> generate new fatal failures in the current thread.
> Actual: it does.
> 2021-03-17T17:04:19Z chronyd exiting
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/dynamic_multi_master-test.cc:1099:
> Failure
> Expected: VerifyClusterAfterMasterAddition(master_hps, orig_num_masters_)
> doesn't generate new fatal failures in the current thread.
> Actual: it does.
> {noformat}
> Although the same verification function is used by other tests for add
> master, this flakiness started showing up after introduction of the
> RecoverDeadMaster test.
> https://github.com/apache/kudu/commit/4b4a8c0f2fdfd15524510821b27fc9c3b5d26b6b
--
This message was sent by Atlassian Jira
(v8.3.4#803005)