Repository: kudu
Updated Branches:
refs/heads/master 0c3f82db1 -> b2fc399f6
[catalog manager] fixed deadlock on catalog shutdown
Fixed deadlock on system catalog manager shutdown in case of
multi-master Kudu cluster. Prior to the fix, the leader master often
hung in its 'elected-as-a-leader' callback while trying to write into
the system table. It was awaiting for completion of the system table
operations, but those were retried indefinitely since the system catalog
table's Raft quorum was not available (other masters were shutdown).
Prior to the fix, the deadlock happened pretty often while running
the master_MasterReplicationTest.TestCycleThroughAllMasters scenario in
master_replication-itest (DEBUG build). This bug manifested itself
in other tests where multi-master Kudu mini-cluster is used. After the
fix, the success rate became 1024 of 1024.
The mechanics behind the deadlock are as follows:
* The majority of the system table's peers go down
(e.g. all non-leader masters shut down).
* The ElectedAsLeaderCb task issues an operation to the system table
(e.g. write newly generated TSK).
* The code below calls Shutdown() on the leader election pool. That
call does not return because the underlying Raft indefinitely
retries to get the response for the submitted operations.
The problem manifested itself the following way: after outputting
something like:
I0224 18:25:16.760793 1964126208 raft_consensus.cc:1569] T
00000000000000000000000000000000 P bd5cf976e19f4843b81cd02f14c6c87a
[term 1 FOLLOWER]: Raft consensus shutting down.
I0224 18:25:16.760815 1964126208 raft_consensus.cc:1585] T
00000000000000000000000000000000 P bd5cf976e19f4843b81cd02f14c6c87a
[term 1 FOLLOWER]: Raft consensus is shut down!
I0224 18:25:16.773479 1964126208 master.cc:214] [email protected]:11011
shutdown complete.
I0224 18:25:16.774673 1964126208 master.cc:210] [email protected]:11012
shutting down...
the test continued to run indefinitely, spitting messages like:
W0224 18:25:21.246805 62234624 consensus_peers.cc:357] T
00000000000000000000000000000000 P 51eb32e67c014327b965ae3e6f4993e1 ->
Peer 14cb97657cb4407fab1ce3e097d7a71b (127.0.0.1:11010): Couldn't send
request to peer 14cb97657cb4407fab1ce3e097d7a71b for tablet
00000000000000000000000000000000. Status: Network error: Client
connection negotiation failed: client connection to 127.0.0.1:11010:
connect: Connection refused (error 61). Retrying in the next heartbeat
period. Already tried 14 times.
Change-Id: I10ad66fe33d4696adf2a02a09e2790afa8869583
Reviewed-on: http://gerrit.cloudera.org:8080/6134
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <[email protected]>
Reviewed-by: David Ribeiro Alves <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/b2fc399f
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/b2fc399f
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/b2fc399f
Branch: refs/heads/master
Commit: b2fc399f65411c537194aee68018178938bab855
Parents: 0c3f82d
Author: Alexey Serbin <[email protected]>
Authored: Fri Feb 24 18:20:18 2017 -0800
Committer: Alexey Serbin <[email protected]>
Committed: Tue Feb 28 01:43:09 2017 +0000
----------------------------------------------------------------------
src/kudu/master/catalog_manager.cc | 44 +++++++++++++++++++++++++--------
1 file changed, 34 insertions(+), 10 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/b2fc399f/src/kudu/master/catalog_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/master/catalog_manager.cc
b/src/kudu/master/catalog_manager.cc
index a35be9b..d859507 100644
--- a/src/kudu/master/catalog_manager.cc
+++ b/src/kudu/master/catalog_manager.cc
@@ -719,9 +719,9 @@ Status CatalogManager::ElectedAsLeaderCb() {
}
Status CatalogManager::WaitUntilCaughtUpAsLeader(const MonoDelta& timeout) {
- string uuid = master_->fs_manager()->uuid();
- Consensus* consensus = sys_catalog_->tablet_peer()->consensus();
- ConsensusStatePB cstate =
consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED);
+ ConsensusStatePB cstate = sys_catalog_->tablet_peer()->consensus()->
+ ConsensusState(CONSENSUS_CONFIG_COMMITTED);
+ const string& uuid = master_->fs_manager()->uuid();
if (!cstate.has_leader_uuid() || cstate.leader_uuid() != uuid) {
return Status::IllegalState(
Substitute("Node $0 not leader. Consensus state: $1",
@@ -800,7 +800,7 @@ void CatalogManager::VisitTablesAndTabletsTask() {
// Hack to block this function until InitSysCatalogAsync() is finished.
shared_lock<LockType> l(lock_);
}
- Consensus* consensus = sys_catalog_->tablet_peer()->consensus();
+ const Consensus* consensus = sys_catalog_->tablet_peer()->consensus();
int64_t term =
consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED).current_term();
{
std::lock_guard<simple_spinlock> l(state_lock_);
@@ -973,8 +973,14 @@ bool CatalogManager::IsInitialized() const {
}
RaftPeerPB::Role CatalogManager::Role() const {
- CHECK(IsInitialized());
- return sys_catalog_->tablet_peer_->consensus()->role();
+ scoped_refptr<consensus::Consensus> consensus;
+ {
+ std::lock_guard<simple_spinlock> l(state_lock_);
+ if (state_ == kRunning) {
+ consensus = sys_catalog_->tablet_peer()->shared_consensus();
+ }
+ }
+ return consensus ? consensus->role() : RaftPeerPB::UNKNOWN_ROLE;
}
void CatalogManager::Shutdown() {
@@ -1005,10 +1011,28 @@ void CatalogManager::Shutdown() {
}
AbortAndWaitForAllTasks(copy);
- // Wait for any outstanding table visitors to finish.
+ // Shutdown the underlying consensus implementation. This aborts all pending
+ // operations on the system table. In case of a multi-master Kudu cluster,
+ // a deadlock might happen if the consensus implementation were active during
+ // further phases: shutting down the leader election pool and the system
+ // catalog.
+ //
+ // The mechanics behind the deadlock are as follows:
+ // * The majority of the system table's peers goes down (e.g. all
non-leader
+ // masters shut down).
+ // * The ElectedAsLeaderCb task issues an operation to the system
+ // table (e.g. write newly generated TSK).
+ // * The code below calls Shutdown() on the leader election pool. That
+ // call does not return because the underlying Raft indefinitely
+ // retries to get the response for the submitted operations.
+ if (sys_catalog_) {
+ sys_catalog_->tablet_peer()->consensus()->Shutdown();
+ }
+
+ // Wait for any outstanding ElectedAsLeaderCb tasks to finish.
//
// Must be done before shutting down the catalog, otherwise its tablet peer
- // may be destroyed while still in use by a table visitor.
+ // may be destroyed while still in use by the ElectedAsLeaderCb task.
leader_election_pool_->Shutdown();
// Shut down the underlying storage for tables and tablets.
@@ -3943,8 +3967,8 @@
CatalogManager::ScopedLeaderSharedLock::ScopedLeaderSharedLock(
}
// Check if the catalog manager is the leader.
- Consensus* consensus = catalog_->sys_catalog_->tablet_peer_->consensus();
- ConsensusStatePB cstate =
consensus->ConsensusState(CONSENSUS_CONFIG_COMMITTED);
+ ConsensusStatePB cstate =
catalog_->sys_catalog_->tablet_peer()->consensus()->
+ ConsensusState(CONSENSUS_CONFIG_COMMITTED);
string uuid = catalog_->master_->fs_manager()->uuid();
if (PREDICT_FALSE(!cstate.has_leader_uuid() || cstate.leader_uuid() !=
uuid)) {
leader_status_ = Status::IllegalState(