[
https://issues.apache.org/jira/browse/KUDU-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055909#comment-16055909
]
Alexey Serbin edited comment on KUDU-1863 at 6/20/17 3:15 PM:
--------------------------------------------------------------
This issue is not fixed yet. The same deadlock happened just recently. That
was 3-master configuration, and the leader master was shutting down when it
received TS heartbeat and started processing it, issuing a write operation to
the system catalog table:
{noformat}
Thread 403 (Thread 0x7f45d1107700 (LWP 9485)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:
#1 0x00007f45de9d2bad in kudu::ConditionVariable::Wait (this=0x7f45d1105918)
at /data/jenkins-works
#2 0x0000000000429da1 in kudu::CountDownLatch::Wait (this=0x7f45d11058e0) at
/data/jenkins-workspac
#3 0x00007f45e8324413 in kudu::master::SysCatalogTable::SyncWrite
(this=0x15c6400, req=0x7f45d1105a
#4 0x00007f45e83250ce in kudu::master::SysCatalogTable::Write (this=0x15c6400,
actions=...) at /dat
#5 0x00007f45e82b665d in kudu::master::CatalogManager::HandleReportedTablet
(this=0x3582a00, ts_des
#6 0x00007f45e82b43b7 in kudu::master::CatalogManager::ProcessTabletReport
(this=0x3582a00, ts_desc
#7 0x00007f45e8309af9 in kudu::master::MasterServiceImpl::TSHeartbeat
(this=0x1e81130, req=0x24620f
#8 0x00007f45e312e438 in
kudu::master::MasterServiceIf::MasterServiceIf(scoped_refptr<kudu::MetricE
#9 0x00007f45e31337d7 in std::_Function_handler<void(const
google::protobuf::Message*, google::prot
#10 0x00007f45e12ddcfa in std::function<void (google::protobuf::Message const*,
google::protobuf::Me
#11 0x00007f45e12dd495 in kudu::rpc::GeneratedServiceIf::Handle
(this=0x1e81130, call=0x14332c0) at
#12 0x00007f45e12e017b in kudu::rpc::ServicePool::RunThread (this=0x17c4400) at
/data/jenkins-worksp
{noformat}
The follower masters sent replies to the master, acknowledging the write
operation, and two threads stuck in {{kudu::rpc::Messenger::QueueInboundCall}}
in attempt to process their responses:
{noformat}
Thread 407 (Thread 0x7f45d993f700 (LWP 9472)):
#0 0x00007f45e3c60b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f45e82c60fc in boost::detail::yield (k=27778558) at
/data/jenkins-workspace/kudu-test-cdh
#2 0x00007f45e82c662d in kudu::rw_semaphore::lock_shared (this=0x28c5908) at
/data/jenkins-workspac
#3 0x00007f45e82c69a4 in kudu::rw_spinlock::lock_shared (this=0x28c5908) at
/data/jenkins-workspace
#4 0x00007f45e82d4f68 in kudu::shared_lock<kudu::rw_spinlock>::shared_lock
(this=0x7f45d993e4c0, m=
#5 0x00007f45e12870e0 in kudu::rpc::Messenger::QueueInboundCall
(this=0x15605a0, call=...) at /data
#6 0x00007f45e1272c00 in kudu::rpc::Connection::HandleIncomingCall
(this=0x32ede00, transfer=...) a
#7 0x00007f45e127242c in kudu::rpc::Connection::ReadHandler (this=0x32ede00,
watcher=..., revents=1
#8 0x00007f45e12782bc in ev::base<ev_io,
ev::io>::method_thunk<kudu::rpc::Connection, &kudu::rpc::C
#9 0x00007f45dd5764fd in ev_invoke_pending (loop=0x22ddb00) at
/data/jenkins-workspace/kudu-test-cd
#10 0x00007f45dd577403 in ev_run (loop=0x22ddb00, flags=0) at
/data/jenkins-workspace/kudu-test-cdh5
#11 0x00007f45e12a32f5 in ev::loop_ref::run (this=0x383d548, flags=0) at
/data/jenkins-workspace/kud
Thread 405 (Thread 0x7f45d893d700 (LWP 9474)):
#0 0x00007f45e3c60b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f45e82c60fc in boost::detail::yield (k=27783826) at
/data/jenkins-workspace/kudu-test-cdh
#2 0x00007f45e82c662d in kudu::rw_semaphore::lock_shared (this=0x28c5948) at
/data/jenkins-workspac
#3 0x00007f45e82c69a4 in kudu::rw_spinlock::lock_shared (this=0x28c5948) at
/data/jenkins-workspace
#4 0x00007f45e82d4f68 in kudu::shared_lock<kudu::rw_spinlock>::shared_lock
(this=0x7f45d893c4c0, m=
#5 0x00007f45e12870e0 in kudu::rpc::Messenger::QueueInboundCall
(this=0x15605a0, call=...) at /data
#6 0x00007f45e1272c00 in kudu::rpc::Connection::HandleIncomingCall
(this=0x357a3c0, transfer=...) a
#7 0x00007f45e127242c in kudu::rpc::Connection::ReadHandler (this=0x357a3c0,
watcher=..., revents=1
#8 0x00007f45e12782bc in ev::base<ev_io,
ev::io>::method_thunk<kudu::rpc::Connection, &kudu::rpc::C
#9 0x00007f45dd5764fd in ev_invoke_pending (loop=0x22ded00) at
/data/jenkins-workspace/kudu-test-cd
#10 0x00007f45dd577403 in ev_run (loop=0x22ded00, flags=0) at
/data/jenkins-workspace/kudu-test-cdh5
#11 0x00007f45e12a32f5 in ev::loop_ref::run (this=0x383de48, flags=0) at
/data/jenkins-workspace/kud
{noformat}
So, the first thread (Thread 403) is stuck while waiting for the response,
which would be processed if the 2nd (Thread 407) and 3rd (Thread 405) threads
were not stuck (actually, it would be enough to have at least one unblocked to
get the majority for the operation acknowledgment). The 2nd and 3rd threads a
blocked because the 4th thread (Thread 1) runs the shutdown and awaits for the
ServicePool to shutdown:
{noformat}
Thread 1 (Thread 0x7f45e9899880 (LWP 4807)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_ti
#1 0x00007f45de9d2e30 in kudu::ConditionVariable::TimedWait (this=0x1ec7430,
max_time=...) at /data
#2 0x0000000000429e20 in kudu::CountDownLatch::WaitFor (this=0x1ec73f8,
delta=...) at /data/jenkins
#3 0x00007f45deab8070 in kudu::ThreadJoiner::Join (this=0x7fff0b292460) at
/data/jenkins-workspace/
#4 0x00007f45e12defbf in kudu::rpc::ServicePool::Shutdown (this=0x17c4400) at
/data/jenkins-workspa
#5 0x00007f45e12deab2 in kudu::rpc::ServicePool::~ServicePool (this=0x17c4400,
__in_chrg=<optimized
#6 0x00007f45e12deb66 in kudu::rpc::ServicePool::~ServicePool (this=0x17c4400,
__in_chrg=<optimized
#7 0x00007f45e63c8010 in kudu::RefCountedThreadSafe<kudu::rpc::RpcService,
kudu::DefaultRefCountedT
#8 0x00007f45e63c78be in
kudu::DefaultRefCountedThreadSafeTraits<kudu::rpc::RpcService>::Destruct (
#9 0x00007f45e63c6e10 in kudu::RefCountedThreadSafe<kudu::rpc::RpcService,
kudu::DefaultRefCountedT
#10 0x00007f45e63c6395 in scoped_refptr<kudu::rpc::RpcService>::~scoped_refptr
(this=0x1681f10, __in
#11 0x00007f45e128aade in std::pair<std::string const,
scoped_refptr<kudu::rpc::RpcService> >::~pair
#12 0x00007f45e1290336 in __gnu_cxx::new_allocator<std::pair<std::string const,
scoped_refptr<kudu::
#13 0x00007f45e128fa71 in
std::allocator_traits<std::allocator<std::pair<std::string const, scoped_r
#14 0x00007f45e128ee93 in
std::allocator_traits<std::allocator<std::pair<std::string const, scoped_r
#15 0x00007f45e128df61 in
std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<s
#16 0x00007f45e128cd13 in
std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<s
#17 0x00007f45e128bac0 in std::_Hashtable<std::string, std::pair<std::string
const, scoped_refptr<ku
#18 0x00007f45e128a5ce in std::unordered_map<std::string,
scoped_refptr<kudu::rpc::RpcService>, std:
#19 0x00007f45e1286e51 in kudu::rpc::Messenger::UnregisterAllServices
(this=0x15605a0) at /data/jenk
#20 0x00007f45e63c50cc in kudu::RpcServer::Shutdown (this=0x2199d40) at
/data/jenkins-workspace/kudu
#21 0x00007f45e63cdd07 in kudu::server::ServerBase::Shutdown (this=0x1f13e00)
at /data/jenkins-works
#22 0x00007f45e8300fd8 in kudu::master::Master::Shutdown (this=0x1f13e00) at
/data/jenkins-workspace
{noformat}
The 4th thread (Thread 1) is blocked because it awaits for the fist thread
(Thread 403) to complete its task and join.
This deadlock happens rarely because it depends on the 'perfect timing' for
sending out the request and calling the
{{kudu::rpc::Messenger::UnregisterAllServices}} initiated by
{{kudu::master::Master::Shutdown}}.
was (Author: aserbin):
This issue is not fixed yet. The same deadlock happened just recently. That
was 3-master configuration, and the leader master was shutting down when it
received TS heartbeat and started processing it, issuing a write operation to
the system catalog table:
{noformat}
Thread 403 (Thread 0x7f45d1107700 (LWP 9485)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:
#1 0x00007f45de9d2bad in kudu::ConditionVariable::Wait (this=0x7f45d1105918)
at /data/jenkins-works
#2 0x0000000000429da1 in kudu::CountDownLatch::Wait (this=0x7f45d11058e0) at
/data/jenkins-workspac
#3 0x00007f45e8324413 in kudu::master::SysCatalogTable::SyncWrite
(this=0x15c6400, req=0x7f45d1105a
#4 0x00007f45e83250ce in kudu::master::SysCatalogTable::Write (this=0x15c6400,
actions=...) at /dat
#5 0x00007f45e82b665d in kudu::master::CatalogManager::HandleReportedTablet
(this=0x3582a00, ts_des
#6 0x00007f45e82b43b7 in kudu::master::CatalogManager::ProcessTabletReport
(this=0x3582a00, ts_desc
#7 0x00007f45e8309af9 in kudu::master::MasterServiceImpl::TSHeartbeat
(this=0x1e81130, req=0x24620f
#8 0x00007f45e312e438 in
kudu::master::MasterServiceIf::MasterServiceIf(scoped_refptr<kudu::MetricE
#9 0x00007f45e31337d7 in std::_Function_handler<void(const
google::protobuf::Message*, google::prot
#10 0x00007f45e12ddcfa in std::function<void (google::protobuf::Message const*,
google::protobuf::Me
#11 0x00007f45e12dd495 in kudu::rpc::GeneratedServiceIf::Handle
(this=0x1e81130, call=0x14332c0) at
#12 0x00007f45e12e017b in kudu::rpc::ServicePool::RunThread (this=0x17c4400) at
/data/jenkins-worksp
#13 0x00007f45e12e15db in boost::_mfi::mf0<void,
kudu::rpc::ServicePool>::operator() (this=0x1ec73e0
#14 0x00007f45e12e1402 in
boost::_bi::list1<boost::_bi::value<kudu::rpc::ServicePool*> >::operator()
#15 0x00007f45e12e12f7 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,
kudu::rpc::ServicePool>, b
#16 0x00007f45e12e122e in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<voi
#17 0x00007f45e1299ee6 in boost::function0<void>::operator() (this=0x1ec73d8)
at /data/jenkins-works
#18 0x00007f45deab9229 in kudu::Thread::SuperviseThread (arg=0x1ec73b0) at
/data/jenkins-workspace/k
#19 0x00007f45e3c59184 in start_thread (arg=0x7f45d1107700) at
pthread_create.c:312
#20 0x00007f45db91337d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}
The follower master sent replies to the master, acknowledging the write
operation, and two threads stuck in {{kudu::rpc::Messenger::QueueInboundCall}}:
{noformat}
Thread 407 (Thread 0x7f45d993f700 (LWP 9472)):
#0 0x00007f45e3c60b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f45e82c60fc in boost::detail::yield (k=27778558) at
/data/jenkins-workspace/kudu-test-cdh
#2 0x00007f45e82c662d in kudu::rw_semaphore::lock_shared (this=0x28c5908) at
/data/jenkins-workspac
#3 0x00007f45e82c69a4 in kudu::rw_spinlock::lock_shared (this=0x28c5908) at
/data/jenkins-workspace
#4 0x00007f45e82d4f68 in kudu::shared_lock<kudu::rw_spinlock>::shared_lock
(this=0x7f45d993e4c0, m=
#5 0x00007f45e12870e0 in kudu::rpc::Messenger::QueueInboundCall
(this=0x15605a0, call=...) at /data
#6 0x00007f45e1272c00 in kudu::rpc::Connection::HandleIncomingCall
(this=0x32ede00, transfer=...) a
#7 0x00007f45e127242c in kudu::rpc::Connection::ReadHandler (this=0x32ede00,
watcher=..., revents=1
#8 0x00007f45e12782bc in ev::base<ev_io,
ev::io>::method_thunk<kudu::rpc::Connection, &kudu::rpc::C
#9 0x00007f45dd5764fd in ev_invoke_pending (loop=0x22ddb00) at
/data/jenkins-workspace/kudu-test-cd
#10 0x00007f45dd577403 in ev_run (loop=0x22ddb00, flags=0) at
/data/jenkins-workspace/kudu-test-cdh5
#11 0x00007f45e12a32f5 in ev::loop_ref::run (this=0x383d548, flags=0) at
/data/jenkins-workspace/kud
#12 0x00007f45e129fce0 in kudu::rpc::ReactorThread::RunThread (this=0x383d540)
at /data/jenkins-work
#13 0x00007f45e12ab14d in boost::_mfi::mf0<void,
kudu::rpc::ReactorThread>::operator() (this=0x2cdbb
#14 0x00007f45e12aa9e8 in
boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> >::operator
#15 0x00007f45e12aa153 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,
kudu::rpc::ReactorThread>,
#16 0x00007f45e12a94e6 in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<voi
#17 0x00007f45e1299ee6 in boost::function0<void>::operator() (this=0x2cdbb58)
at /data/jenkins-works
#18 0x00007f45deab9229 in kudu::Thread::SuperviseThread (arg=0x2cdbb30) at
/data/jenkins-workspace/k
#19 0x00007f45e3c59184 in start_thread (arg=0x7f45d993f700) at
pthread_create.c:312
#20 0x00007f45db91337d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 405 (Thread 0x7f45d893d700 (LWP 9474)):
#0 0x00007f45e3c60b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f45e82c60fc in boost::detail::yield (k=27783826) at
/data/jenkins-workspace/kudu-test-cdh
#2 0x00007f45e82c662d in kudu::rw_semaphore::lock_shared (this=0x28c5948) at
/data/jenkins-workspac
#3 0x00007f45e82c69a4 in kudu::rw_spinlock::lock_shared (this=0x28c5948) at
/data/jenkins-workspace
#4 0x00007f45e82d4f68 in kudu::shared_lock<kudu::rw_spinlock>::shared_lock
(this=0x7f45d893c4c0, m=
#5 0x00007f45e12870e0 in kudu::rpc::Messenger::QueueInboundCall
(this=0x15605a0, call=...) at /data
#6 0x00007f45e1272c00 in kudu::rpc::Connection::HandleIncomingCall
(this=0x357a3c0, transfer=...) a
#7 0x00007f45e127242c in kudu::rpc::Connection::ReadHandler (this=0x357a3c0,
watcher=..., revents=1
#8 0x00007f45e12782bc in ev::base<ev_io,
ev::io>::method_thunk<kudu::rpc::Connection, &kudu::rpc::C
#9 0x00007f45dd5764fd in ev_invoke_pending (loop=0x22ded00) at
/data/jenkins-workspace/kudu-test-cd
#10 0x00007f45dd577403 in ev_run (loop=0x22ded00, flags=0) at
/data/jenkins-workspace/kudu-test-cdh5
#11 0x00007f45e12a32f5 in ev::loop_ref::run (this=0x383de48, flags=0) at
/data/jenkins-workspace/kud
#12 0x00007f45e129fce0 in kudu::rpc::ReactorThread::RunThread (this=0x383de40)
at /data/jenkins-work
#13 0x00007f45e12ab14d in boost::_mfi::mf0<void,
kudu::rpc::ReactorThread>::operator() (this=0x1c3f5
#14 0x00007f45e12aa9e8 in
boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> >::operator
#15 0x00007f45e12aa153 in boost::_bi::bind_t<void, boost::_mfi::mf0<void,
kudu::rpc::ReactorThread>,
#16 0x00007f45e12a94e6 in
boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<voi
#17 0x00007f45e1299ee6 in boost::function0<void>::operator() (this=0x1c3f5b8)
at /data/jenkins-works
#18 0x00007f45deab9229 in kudu::Thread::SuperviseThread (arg=0x1c3f590) at
/data/jenkins-workspace/k
#19 0x00007f45e3c59184 in start_thread (arg=0x7f45d893d700) at
pthread_create.c:312
#20 0x00007f45db91337d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}
So, the first thread (Thread 403) is stuck while waiting for the response,
which would be processed if the 2nd (Thread 407) and 3rd (Thread 405) threads
were not stuck (actually, it would be enough to have at least one unblocked to
get the majority for the operation acknowledgment). The 2nd and 3rd threads a
blocked because the 4th thread (Thread 1) runs the shutdown and awaits for the
ServicePool to shutdown:
{noformat}
Thread 1 (Thread 0x7f45e9899880 (LWP 4807)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_ti
#1 0x00007f45de9d2e30 in kudu::ConditionVariable::TimedWait (this=0x1ec7430,
max_time=...) at /data
#2 0x0000000000429e20 in kudu::CountDownLatch::WaitFor (this=0x1ec73f8,
delta=...) at /data/jenkins
#3 0x00007f45deab8070 in kudu::ThreadJoiner::Join (this=0x7fff0b292460) at
/data/jenkins-workspace/
#4 0x00007f45e12defbf in kudu::rpc::ServicePool::Shutdown (this=0x17c4400) at
/data/jenkins-workspa
#5 0x00007f45e12deab2 in kudu::rpc::ServicePool::~ServicePool (this=0x17c4400,
__in_chrg=<optimized
#6 0x00007f45e12deb66 in kudu::rpc::ServicePool::~ServicePool (this=0x17c4400,
__in_chrg=<optimized
#7 0x00007f45e63c8010 in kudu::RefCountedThreadSafe<kudu::rpc::RpcService,
kudu::DefaultRefCountedT
#8 0x00007f45e63c78be in
kudu::DefaultRefCountedThreadSafeTraits<kudu::rpc::RpcService>::Destruct (
#9 0x00007f45e63c6e10 in kudu::RefCountedThreadSafe<kudu::rpc::RpcService,
kudu::DefaultRefCountedT
#10 0x00007f45e63c6395 in scoped_refptr<kudu::rpc::RpcService>::~scoped_refptr
(this=0x1681f10, __in
#11 0x00007f45e128aade in std::pair<std::string const,
scoped_refptr<kudu::rpc::RpcService> >::~pair
#12 0x00007f45e1290336 in __gnu_cxx::new_allocator<std::pair<std::string const,
scoped_refptr<kudu::
#13 0x00007f45e128fa71 in
std::allocator_traits<std::allocator<std::pair<std::string const, scoped_r
#14 0x00007f45e128ee93 in
std::allocator_traits<std::allocator<std::pair<std::string const, scoped_r
#15 0x00007f45e128df61 in
std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<s
#16 0x00007f45e128cd13 in
std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<s
#17 0x00007f45e128bac0 in std::_Hashtable<std::string, std::pair<std::string
const, scoped_refptr<ku
#18 0x00007f45e128a5ce in std::unordered_map<std::string,
scoped_refptr<kudu::rpc::RpcService>, std:
#19 0x00007f45e1286e51 in kudu::rpc::Messenger::UnregisterAllServices
(this=0x15605a0) at /data/jenk
#20 0x00007f45e63c50cc in kudu::RpcServer::Shutdown (this=0x2199d40) at
/data/jenkins-workspace/kudu
#21 0x00007f45e63cdd07 in kudu::server::ServerBase::Shutdown (this=0x1f13e00)
at /data/jenkins-works
#22 0x00007f45e8300fd8 in kudu::master::Master::Shutdown (this=0x1f13e00) at
/data/jenkins-workspace
#23 0x00007f45e831dab0 in kudu::master::MiniMaster::Shutdown (this=0x32af680)
at /data/jenkins-works
#24 0x0000000000428225 in
kudu::tools::RemoteKsckTest_TestLeaderMasterDown_Test::TestBody (this=0x14
#25 0x00007f45dc851af8 in HandleSehExceptionsInMethodIfSupported<testing::Test,
void> (location=0x7f
#26 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>
(object=object@entry
#27 0x00007f45dc845fc2 in testing::Test::Run (this=0x1432000) at
/data/jenkins-workspace/kudu-test-c
#28 0x00007f45dc846108 in testing::TestInfo::Run (this=0x13a86e0) at
/data/jenkins-workspace/kudu-te
#29 0x00007f45dc8461e5 in testing::TestCase::Run (this=0x13f6000) at
/data/jenkins-workspace/kudu-te
#30 0x00007f45dc8464c8 in testing::internal::UnitTestImpl::RunAllTests
(this=0x13c8200) at /data/jen
#31 0x00007f45dc852008 in
HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bo
#32
testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl,
bool> (o
#33 0x00007f45dc8467ad in testing::UnitTest::Run (this=0x7f45dca772c0
<testing::UnitTest::GetInstanc
#34 0x00007f45e8f51e55 in RUN_ALL_TESTS () at
/data/jenkins-workspace/kudu-test-cdh5.12/thirdparty/i
#35 0x00007f45e8f4fc80 in main (argc=1, argv=0x7fff0b292e18) at
/data/jenkins-workspace/kudu-test-cd
#36 0x00007f45db83af45 in __libc_start_main (main=0x422b80 <main@plt>, argc=3,
argv=0x7fff0b292e08,
#37 0x0000000000423529 in _start ()
{noformat}
The 4th thread (Thread 1) is blocked because it awaits for the fist thread
(Thread 403) to complete its task and join.
This deadlock happens rarely because it depends on the 'perfect timing' for
sending out the request and calling the
{{kudu::rpc::Messenger::UnregisterAllServices}} initiated by
{{kudu::master::Master::Shutdown}}.
> Deadlock while shutting down master
> -----------------------------------
>
> Key: KUDU-1863
> URL: https://issues.apache.org/jira/browse/KUDU-1863
> Project: Kudu
> Issue Type: Bug
> Components: master
> Affects Versions: 1.3.0
> Reporter: Todd Lipcon
> Fix For: Backlog
>
> Attachments: ksck_remote-test.txt.bz2
>
>
> I saw ksck_remote-test fail because the master was hanging during its attempt
> to shut down. Analysis follows.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)