[
https://issues.apache.org/jira/browse/KUDU-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin resolved KUDU-2900.
---------------------------------
Fix Version/s: n/a
Resolution: Duplicate
> Master crash reported in disk_failure-itest
> -------------------------------------------
>
> Key: KUDU-2900
> URL: https://issues.apache.org/jira/browse/KUDU-2900
> Project: Kudu
> Issue Type: Bug
> Reporter: Andrew Wong
> Priority: Major
> Fix For: n/a
>
> Attachments: disk_failure-itest.txt
>
>
> When getting table locations immediately following a tablet copy (though this
> may be a red herring), the master hit a DCHECK when running
> GetTableLocations().
>
> {code:java}
> I0720 00:42:48.285115 234 cluster_verifier.cc:82] Check not successful yet,
> sleeping and retrying: Runtime error: ksck discovered errors
> I0720 00:42:48.667629 2277 raft_consensus.cc:1184] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2
> FOLLOWER]: Refusing update from remote peer 526eadb7abb3438790a38e0d9973a6a5:
> Log matching property violated. Preceding OpId in replica: term: 1 index: 1.
> Preceding OpId from leader: term: 2 index: 2. (index mismatch)
> I0720 00:42:48.668309 2961 consensus_queue.cc:984] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]:
> Connected to new peer: Peer: permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 }, Status: LMP_MISMATCH, Last received: 0.0, Next
> index: 2, Last known committed idx: 1, Time since last communication: 0.000s
> W0720 00:42:48.673665 2471 consensus_peers.cc:458] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 -> Peer
> bc957945380a4e119d7ac829148eede6 (127.0.58.130:37359): Couldn't send request
> to peer bc957945380a4e119d7ac829148eede6. Error code: TABLET_FAILED (20).
> Status: Illegal state: Tablet not RUNNING: FAILED: IO error: some tablet data
> is in a failed directory. This is attempt 1: this message will repeat every
> 5th retry.
> I0720 00:42:48.674023 2961 raft_consensus.cc:922] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5:
> Attempting to remove follower bc957945380a4e119d7ac829148eede6 from the Raft
> config. Reason: The tablet replica hosted on peer
> bc957945380a4e119d7ac829148eede6 has failed
> I0720 00:42:48.674721 2961 consensus_queue.cc:206] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]:
> Queue going to LEADER mode. State: All replicated index: 0, Majority
> replicated index: 2, Committed index: 2, Last appended: 2.2, Last appended by
> leader: 1, Current term: 2, Majority size: 2, State: 0, Mode: LEADER, active
> raft config: opid_index: 3 OBSOLETE_local: false peers { permanent_uuid:
> "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host:
> "127.0.58.131" port: 33941 } } peers { permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 } }
> I0720 00:42:48.675925 2277 raft_consensus.cc:1184] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2
> FOLLOWER]: Refusing update from remote peer 526eadb7abb3438790a38e0d9973a6a5:
> Log matching property violated. Preceding OpId in replica: term: 2 index: 2.
> Preceding OpId from leader: term: 2 index: 3. (index mismatch)
> I0720 00:42:48.676373 2965 consensus_queue.cc:984] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [LEADER]:
> Connected to new peer: Peer: permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 }, Status: LMP_MISMATCH, Last received: 0.0, Next
> index: 3, Last known committed idx: 2, Time since last communication: 0.000s
> I0720 00:42:48.678308 2961 raft_consensus.cc:2792] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P 526eadb7abb3438790a38e0d9973a6a5 [term 2
> LEADER]: Committing config change with OpId 2.3: config changed from index -1
> to 3, VOTER bc957945380a4e119d7ac829148eede6 (127.0.58.130) evicted. New
> config: { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid:
> "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host:
> "127.0.58.131" port: 33941 } } peers { permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 } } }
> I0720 00:42:48.678822 2277 raft_consensus.cc:2792] T
> f8dcedcbd6bb47ce9de0a37cc84ebca5 P d9d5e137df31450caa9dc831972c1f5e [term 2
> FOLLOWER]: Committing config change with OpId 2.3: config changed from index
> -1 to 3, VOTER bc957945380a4e119d7ac829148eede6 (127.0.58.130) evicted. New
> config: { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid:
> "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host:
> "127.0.58.131" port: 33941 } } peers { permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 } } }
> I0720 00:42:48.787096 2969 ts_tablet_manager.cc:682] T
> ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6:
> Initiating tablet copy from peer d9d5e137df31450caa9dc831972c1f5e
> (127.0.58.129:37035)
> I0720 00:42:48.787333 2969 tablet_copy_client.cc:204] T
> ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6: tablet
> copy: overwriting existing tombstoned replica with an unknown last-logged opid
> I0720 00:42:48.787505 2969 tablet_copy_client.cc:241] T
> ade4b6ce17504569a6d13d9018288194 P bc957945380a4e119d7ac829148eede6: tablet
> copy: Beginning tablet copy session from remote peer at address
> 127.0.58.129:37035
> I0720 00:42:48.793766 2297 tablet_copy_service.cc:135] P
> d9d5e137df31450caa9dc831972c1f5e: Received BeginTabletCopySession request for
> tablet ade4b6ce17504569a6d13d9018288194 from peer
> bc957945380a4e119d7ac829148eede6 ({username='slave'} at 127.0.58.130:35599)
> I0720 00:42:48.793942 2297 tablet_copy_service.cc:156] P
> d9d5e137df31450caa9dc831972c1f5e: Beginning new tablet copy session on tablet
> ade4b6ce17504569a6d13d9018288194 from peer bc957945380a4e119d7ac829148eede6
> at {username='slave'} at 127.0.58.130:35599: session id =
> bc957945380a4e119d7ac829148eede6-ade4b6ce17504569a6d13d9018288194
> F0720 00:42:48.836412 2162 quorum_util.cc:167] Check failed:
> RaftPeerPB::NON_PARTICIPANT != GetConsensusRole(peer_uuid, cstate) (3 vs. 3)
> Peer bc957945380a4e119d7ac829148eede6 << not a participant in current_term: 1
> leader_uuid: "d9d5e137df31450caa9dc831972c1f5e" committed_config {
> opid_index: 3 OBSOLETE_local: false peers { permanent_uuid:
> "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host:
> "127.0.58.131" port: 33941 } } peers { permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 } } peers { permanent_uuid:
> "bc957945380a4e119d7ac829148eede6" member_type: VOTER last_known_addr { host:
> "127.0.58.130" port: 37359 } } } pending_config { opid_index: 4
> OBSOLETE_local: false peers { permanent_uuid:
> "526eadb7abb3438790a38e0d9973a6a5" member_type: VOTER last_known_addr { host:
> "127.0.58.131" port: 33941 } } peers { permanent_uuid:
> "d9d5e137df31450caa9dc831972c1f5e" member_type: VOTER last_known_addr { host:
> "127.0.58.129" port: 37035 } } }
> *** Check failure stack trace: ***
> @ 0x7f527aaea62d google::LogMessage::Fail() at ??:0
> @ 0x7f527aaec64c google::LogMessage::SendToLog() at ??:0
> @ 0x7f527aaea189 google::LogMessage::Flush() at ??:0
> @ 0x7f527aaecfdf google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f52828cbc3c kudu::consensus::GetParticipantRole() at ??:0
> @ 0x7f528d3f4e93 kudu::master::CatalogManager::BuildLocationsForTablet() at
> ??:0
> @ 0x7f528d3f9e27 kudu::master::CatalogManager::GetTableLocations() at ??:0
> @ 0x7f528d543fe6 kudu::master::MasterServiceImpl::GetTableLocations() at ??:0
> @ 0x7f5288b49caa std::_Function_handler<>::_M_invoke() at ??:0
> @ 0x7f527f34431c std::function<>::operator()() at ??:0
> @ 0x7f527f342e8b kudu::rpc::GeneratedServiceIf::Handle() at ??:0
> @ 0x7f527f346988 kudu::rpc::ServicePool::RunThread() at ??:0
> @ 0x7f527f34beb3 boost::_bi::bind_t<>::operator()() at ??:0
> @ 0x7f527f2a680c boost::function0<>::operator()() at ??:0
> @ 0x7f527bcf1e0b kudu::Thread::SuperviseThread() at ??:0
> @ 0x7f52841a7184 start_thread at ??:0
> @ 0x7f5277703ffd clone at ??:0{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)