Adar Dembo created KUDU-2118:
--------------------------------
Summary: Running RaftConsensus instances should not be destroyed
by reactor threads
Key: KUDU-2118
URL: https://issues.apache.org/jira/browse/KUDU-2118
Project: Kudu
Issue Type: Bug
Components: consensus
Affects Versions: 1.5.0
Reporter: Adar Dembo
Priority: Critical
Attachments: 0_create-table-stress-test.txt.gz
RaftConsensus is an object with shared ownership, and one of its invariants is
that the last ref may be dropped (and thus the object destroyed) by the reactor
thread, but if that happens, RaftConsensus must already be shut down, because
the act of shutting down may wait, and reactor threads aren't allowed to wait.
And yet, here's a pre-commit test failure showing otherwise. In it, a reactor
thread destroys a LeaderElection object, which destroys the embedded
ElectionDecisionCallback, which had the last ref to RaftConsensus, which then
destroys it. Normally the Shutdown call in the destructor would no-op, but
apparently it's going through a full stop sequence instead.
{noformat}
thread_restrictions.cc:79] Check failed: LoadTLS()->wait_allowed Waiting is not
allowed to be used on this thread to prevent server-wide latency aberrations
and deadlocks. Thread 3852 (name: "rpc reactor", category: "reactor")
@ 0x7fcfc8864507 kudu::ThreadRestrictions::AssertWaitAllowed() at ??:0
@ 0x7fcfc55de12f kudu::consensus::RaftConsensus::Stop() at ??:0
@ 0x7fcfc55de6aa kudu::consensus::RaftConsensus::Shutdown() at ??:0
@ 0x7fcfc55cdba4 kudu::consensus::RaftConsensus::~RaftConsensus() at
??:0
@ 0x7fcfc55fab95 __gnu_cxx::new_allocator<>::destroy<>() at ??:0
@ 0x7fcfc55fab47 std::allocator_traits<>::_S_destroy<>() at ??:0
@ 0x7fcfc55faae9 std::allocator_traits<>::destroy<>() at ??:0
@ 0x7fcfc55fa91b std::_Sp_counted_ptr_inplace<>::_M_dispose() at ??:0
@ 0x4304fa std::_Sp_counted_base<>::_M_release() at
/usr/include/c++/4.8/bits/shared_ptr_base.h:158
@ 0x42e68f std::__shared_count<>::~__shared_count() at
/usr/include/c++/4.8/bits/shared_ptr_base.h:547
@ 0x7fcfcb8a4032 std::__shared_ptr<>::~__shared_ptr() at ??:0
@ 0x7fcfcb8a4072 std::shared_ptr<>::~shared_ptr() at ??:0
@ 0x7fcfc55ed4d4 std::_Head_base<>::~_Head_base() at ??:0
@ 0x7fcfc55ed4f2
_ZNSt11_Tuple_implILm0EJSt10shared_ptrIN4kudu9consensus13RaftConsensusEENS3_14ElectionReasonESt12_PlaceholderILi1EEEED1Ev
at ??:0
@ 0x7fcfc55ed50c std::tuple<>::~tuple() at ??:0
@ 0x7fcfc55ed52a std::_Bind<>::~_Bind() at ??:0
@ 0x7fcfc55f6162 std::_Function_base::_Base_manager<>::_M_destroy() at
??:0
@ 0x7fcfc55f34ed std::_Function_base::_Base_manager<>::_M_manager() at
??:0
@ 0x7fcfcbe5d5c5 std::_Function_base::~_Function_base() at ??:0
@ 0x7fcfc55b0d18 std::function<>::~function() at ??:0
@ 0x7fcfc55add9d kudu::consensus::LeaderElection::~LeaderElection() at
??:0
@ 0x7fcfc55b699a kudu::RefCountedThreadSafe<>::DeleteInternal() at ??:0
@ 0x7fcfc55b697a kudu::DefaultRefCountedThreadSafeTraits<>::Destruct()
at ??:0
@ 0x7fcfc55b6960 kudu::RefCountedThreadSafe<>::Release() at ??:0
@ 0x7fcfc55b6936 kudu::internal::MaybeRefcount<>::Release() at ??:0
@ 0x7fcfc55b68c4 kudu::internal::BindState<>::~BindState() at ??:0
@ 0x7fcfc55b6910 kudu::internal::BindState<>::~BindState() at ??:0
@ 0x7fcfcb44f23d kudu::RefCountedThreadSafe<>::DeleteInternal() at ??:0
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)