daicheng created KUDU-3460:
------------------------------
Summary: RPC error from VoteRequest()call to peer **:Timed out:
RequestConsensusVote RPC to ** time out after [SENT]
Key: KUDU-3460
URL: https://issues.apache.org/jira/browse/KUDU-3460
Project: Kudu
Issue Type: Bug
Affects Versions: 1.16.0
Reporter: daicheng
Attachments: image-2023-03-17-15-27-45-755.png,
image-2023-03-17-15-28-13-480.png, image-2023-03-17-15-28-40-361.png
we hava 3 kudu_master and 6 kudu_tserver,when i create 2W tables to kudu, wei
got some error, and we cann't read any data from kudu,it throw many errors:
here the errors from client :
{code:java}
Job aborted due to stage failure: Task 0 in stage 35.0 failed 4 times, most
recent failure: Lost task 0.3 in stage 35.0 (TID 9601) (prod-bigdata-mw-159
executor 3): java.lang.RuntimeException:
org.apache.kudu.client.NonRecoverableException: tablet hasn't heard from leader
or there hasn't been a stable leader fo..
2023-03-08 09:59:49,198 INFO org.apache.kudu.client.AsyncKuduClient
[] - Invalidating location master-10.0.2.33:7051(10.0.2.33:7051) for
tablet Kudu Master: Service unavailable: ListTables request on
kudu.master.MasterService from 10.0.3.82:8764 dropped due to backpressure. The
service queue is full; it has 100 items. {code}
and i found kudu tserver has many error like :
{code:java}
W0307 14:36:57.368008 14759 leader_election.cc:334] T
fa2a3b405a87466da7a6b1a962f35d99 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 1640 pre-election: RPC error from VoteRequest() call topeer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.206s (SENT)W0307
14:36:57.368801 14759 leader_election.cc:334] T
5f8d377660aa46f29e3f1595a33d086c P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer
dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.725s (SENT)W0307
14:36:57.368917 14759 leader_election.cc:334] T
a32af7dd8af44b47b4b26d7a222c2f6b P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 344 pre-election: RPC error from VoteRequest() call to peer
dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.713s (SENT)W0307
14:36:57.369045 14759 leader_election.cc:334] T
15e9b550c3274243a5ee923ceda67dc5 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 1509 pre-election: RPC error from VoteRequest() call topeer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 3.056s (SENT)W0307
14:36:57.369563 14759 leader_election.cc:334] T
e5e49b443f71478984162a2eb65d3607 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 1575 pre-election: RPC error from VoteRequest() call topeer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.553s (SENT)W0307
14:36:57.371872 14759 leader_election.cc:334] T
2ec17c9dd68e47ceb7f572efb9f18fe3 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 1633 pre-election: RPC error from VoteRequest() call topeer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.010s (SENT)W0307
14:36:57.372673 14759 leader_election.cc:334] T
a91cf24cc4c943cbbd041c7e6726d7aa P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 1610 pre-election: RPC error from VoteRequest() call topeer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.970s (SENT)W0307
14:36:57.372789 14759 leader_election.cc:334] T
cd667f33abb74afba4b9c510b8f6dfaa P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 3 pre-election: RPC error from VoteRequest() call to peer
dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.674s (SENT)W0307
14:36:57.373358 14759 leader_election.cc:334] T
39709b52ffe34f81b08d0562e45a7a13 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 44 pre-election: RPC error from VoteRequest() call to peer
d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.636s (SENT)W0307
14:36:57.373525 14759 leader_election.cc:334] T
00da9e2c20814ac88e18f7d7220f01c9 P 5ac35cfccaf84228bf6d589501ec533e
[CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer
dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out:
RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.524s (SENT) {code}
and the disk where wal dir located is abnormal
!image-2023-03-17-15-27-45-755.png|width=314,height=166!!image-2023-03-17-15-28-40-361.png|width=309,height=135!
here is the wal file look like :
{code:java}
schema_version: 0compression_codec: LZ41.1@6873507535186497536 REPLICATE NO_OP
id { term: 1 index: 1 } timestamp: 6873507535186497536 op_type: NO_OP
noop_request { }COMMIT 1.1 op_type: NO_OP commited_op_id { term: 1
index: 1 }1.2@6873839930165628928 REPLICATE CHANGE_CONFIG_OP id { term:
1 index: 2 } timestamp: 6873839930165628928 op_type: CHANGE_CONFIG_OP
change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config
{ opid_index: -1 OBSOLETE_local: false peers { permanent_uuid:
"448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host:
"10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host:
"10.0.2.20" port: 7050 } } } new_config { opid_index: 2 OBSOLETE_local: false
peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER
last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host:
"10.0.2.20" port: 7050 } } peers { permanent_uuid:
"d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr {
host: "10.0.2.19" port: 7050 } attrs { promote: true } } } }COMMIT 1.2
op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 2
}1.3@6873841023495979008 REPLICATE CHANGE_CONFIG_OP id { term: 1 index:
3 } timestamp: 6873841023495979008 op_type: CHANGE_CONFIG_OP
change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config
{ opid_index: 2 OBSOLETE_local: false peers {permanent_uuid:
"448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host:
"10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host:
"10.0.2.20" port: 7050 } } peers { permanent_uuid:
"d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr {
host: "10.0.2.19" port: 7050 } attrs { promote: true } } } new_config {
opid_index: 3 OBSOLETE_local: false peers { permanent_uuid:
"448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host:
"10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host:
"10.0.2.20" port: 7050 } } peers { permanent_uuid:
"d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host:
"10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.3
op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 3
}1.4@6873841038243381248 REPLICATE CHANGE_CONFIG_OP id { term: 1 index:
4 } timestamp: 6873841038243381248 op_type: CHANGE_CONFIG_OP
change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config
{ opid_index: 3 OBSOLETE_local: false peers {permanent_uuid:
"448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host:
"10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host:
"10.0.2.20" port: 7050 } } peers { permanent_uuid:
"d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host:
"10.0.2.19" port: 7050 } attrs { promote: false } } } new_config { opid_index:
4 OBSOLETE_local: false peers { permanent_uuid:
"448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host:
"10.0.2.14" port: 7050 } } peers { permanent_uuid:
"d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host:
"10.0.2.15" port: 7050 } } peers { permanent_uuid:
"d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host:
"10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.4 {code}
can anyone explain what happened?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)