[
https://issues.apache.org/jira/browse/KUDU-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy resolved KUDU-2118.
------------------------------
Resolution: Fixed
Fix Version/s: 1.5.0
Fixed. This bug never made it into a release.
> Running RaftConsensus instances should not be destroyed by reactor threads
> --------------------------------------------------------------------------
>
> Key: KUDU-2118
> URL: https://issues.apache.org/jira/browse/KUDU-2118
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 1.5.0
> Reporter: Adar Dembo
> Assignee: Mike Percy
> Priority: Critical
> Fix For: 1.5.0
>
> Attachments: 07e4b47e517a4d44b1b8cbdaed95e216.txt,
> 0_create-table-stress-test.txt.gz
>
>
> RaftConsensus is an object with shared ownership, and one of its invariants
> is that the last ref may be dropped (and thus the object destroyed) by the
> reactor thread, but if that happens, RaftConsensus must already be shut down,
> because the act of shutting down may wait, and reactor threads aren't allowed
> to wait.
> And yet, here's a pre-commit test failure showing otherwise. In it, a reactor
> thread destroys a LeaderElection object, which destroys the embedded
> ElectionDecisionCallback, which had the last ref to RaftConsensus, which then
> destroys it. Normally the Shutdown call in the destructor would no-op, but
> apparently it's going through a full stop sequence instead.
> {noformat}
> thread_restrictions.cc:79] Check failed: LoadTLS()->wait_allowed Waiting is
> not allowed to be used on this thread to prevent server-wide latency
> aberrations and deadlocks. Thread 3852 (name: "rpc reactor", category:
> "reactor")
> @ 0x7fcfc8864507 kudu::ThreadRestrictions::AssertWaitAllowed() at
> ??:0
> @ 0x7fcfc55de12f kudu::consensus::RaftConsensus::Stop() at ??:0
> @ 0x7fcfc55de6aa kudu::consensus::RaftConsensus::Shutdown() at ??:0
> @ 0x7fcfc55cdba4 kudu::consensus::RaftConsensus::~RaftConsensus() at
> ??:0
> @ 0x7fcfc55fab95 __gnu_cxx::new_allocator<>::destroy<>() at ??:0
> @ 0x7fcfc55fab47 std::allocator_traits<>::_S_destroy<>() at ??:0
> @ 0x7fcfc55faae9 std::allocator_traits<>::destroy<>() at ??:0
> @ 0x7fcfc55fa91b std::_Sp_counted_ptr_inplace<>::_M_dispose() at ??:0
> @ 0x4304fa std::_Sp_counted_base<>::_M_release() at
> /usr/include/c++/4.8/bits/shared_ptr_base.h:158
> @ 0x42e68f std::__shared_count<>::~__shared_count() at
> /usr/include/c++/4.8/bits/shared_ptr_base.h:547
> @ 0x7fcfcb8a4032 std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7fcfcb8a4072 std::shared_ptr<>::~shared_ptr() at ??:0
> @ 0x7fcfc55ed4d4 std::_Head_base<>::~_Head_base() at ??:0
> @ 0x7fcfc55ed4f2
> _ZNSt11_Tuple_implILm0EJSt10shared_ptrIN4kudu9consensus13RaftConsensusEENS3_14ElectionReasonESt12_PlaceholderILi1EEEED1Ev
> at ??:0
> @ 0x7fcfc55ed50c std::tuple<>::~tuple() at ??:0
> @ 0x7fcfc55ed52a std::_Bind<>::~_Bind() at ??:0
> @ 0x7fcfc55f6162 std::_Function_base::_Base_manager<>::_M_destroy()
> at ??:0
> @ 0x7fcfc55f34ed std::_Function_base::_Base_manager<>::_M_manager()
> at ??:0
> @ 0x7fcfcbe5d5c5 std::_Function_base::~_Function_base() at ??:0
> @ 0x7fcfc55b0d18 std::function<>::~function() at ??:0
> @ 0x7fcfc55add9d kudu::consensus::LeaderElection::~LeaderElection()
> at ??:0
> @ 0x7fcfc55b699a kudu::RefCountedThreadSafe<>::DeleteInternal() at
> ??:0
> @ 0x7fcfc55b697a
> kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0
> @ 0x7fcfc55b6960 kudu::RefCountedThreadSafe<>::Release() at ??:0
> @ 0x7fcfc55b6936 kudu::internal::MaybeRefcount<>::Release() at ??:0
> @ 0x7fcfc55b68c4 kudu::internal::BindState<>::~BindState() at ??:0
> @ 0x7fcfc55b6910 kudu::internal::BindState<>::~BindState() at ??:0
> @ 0x7fcfcb44f23d kudu::RefCountedThreadSafe<>::DeleteInternal() at
> ??:0
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)