[
https://issues.apache.org/jira/browse/KUDU-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518722#comment-16518722
]
Dan Burkert commented on KUDU-2088:
-----------------------------------
https://github.com/apache/kudu/commit/c38631097a466f50209e211218c6668789f4b445
> UpdateReplica accesses stack object after it is destroyed
> ---------------------------------------------------------
>
> Key: KUDU-2088
> URL: https://issues.apache.org/jira/browse/KUDU-2088
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 1.4.0
> Reporter: Adar Dembo
> Assignee: Adar Dembo
> Priority: Major
> Fix For: 1.5.0
>
>
> {{RaftConsensus::UpdateReplica()}} has this bit of code in it:
> {code}
> // 5 - We wait for the writes to be durable.
> // Note that this is safe because dist consensus now only supports a
> single outstanding
> // request at a time and this way we can allow commits to proceed while
> we wait.
> TRACE("Waiting on the replicates to finish logging");
> TRACE_EVENT0("consensus", "Wait for log");
> Status s;
> do {
> s = log_synchronizer.WaitFor(
> MonoDelta::FromMilliseconds(FLAGS_raft_heartbeat_interval_ms));
> // If just waiting for our log append to finish lets snooze the timer.
> // We don't want to fire leader election because we're waiting on our
> own log.
> if (s.IsTimedOut()) {
> RETURN_NOT_OK(SnoozeFailureDetector());
> }
> } while (s.IsTimedOut());
> RETURN_NOT_OK(s);
> {code}
> {{log_synchronizer}} is a stack-allocated {{Synchronizer}}. A reference to it
> is passed into an asynchronous log append function. The purpose of this code
> is to wait for that asynchronous function to finish while periodically
> snoozing the failure detector.
> However, if {{SnoozeFailureDetector()}} were to return an error, we'll exit
> the function early and destroy {{log_synchronizer}}. This can lead to a crash
> if the reference to {{log_synchronizer}} is accessed later by the
> asynchronous log append function. Here's one such crash stack trace:
> {noformat}
> F0801 02:58:43.488010 13715 mutex.cc:76] Check failed: rv == 0 || rv == 16 .
> Invalid argument. Owner tid: 0; Self tid: 128; To collect the owner stack
> trace, enable the flag --debug_mutex_collect_stacktrace
> *** Check failure stack trace: ***
> @ 0x7f843b5d22fd google::LogMessage::Fail() at ??:0
> @ 0x7f843b5d41bd google::LogMessage::SendToLog() at ??:0
> @ 0x7f843b5d1e39 google::LogMessage::Flush() at ??:0
> @ 0x7f843b5d4c5f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f843c49dc46 kudu::Mutex::TryAcquire() at ??:0
> @ 0x7f843c49dcd1 kudu::Mutex::Acquire() at ??:0
> @ 0x7f8444243290 kudu::MutexLock::MutexLock() at ??:0
> @ 0x7f8444274d02 kudu::CountDownLatch::CountDown() at ??:0
> @ 0x7f8444274dd1 kudu::CountDownLatch::CountDown() at ??:0
> @ 0x7f84428c4d4f kudu::Synchronizer::StatusCB() at ??:0
> @ 0x7f84428cf73e kudu::internal::RunnableAdapter<>::Run() at ??:0
> @ 0x7f84428ce716 kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
> @ 0x7f84428ccf37 kudu::internal::Invoker<>::Run() at ??:0
> @ 0x7f8442879e6f kudu::Callback<>::Run() at ??:0
> @ 0x7f844286e28e
> kudu::consensus::PeerMessageQueue::LocalPeerAppendFinished() at ??:0
> @ 0x7f8442882275 kudu::internal::RunnableAdapter<>::Run() at ??:0
> @ 0x7f8442880649 kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
> @ 0x7f844287e87f kudu::internal::Invoker<>::Run() at ??:0
> @ 0x7f8442879e6f kudu::Callback<>::Run() at ??:0
> @ 0x7f8442891eec kudu::consensus::LogCache::LogCallback() at ??:0
> @ 0x7f8442897e94 kudu::internal::RunnableAdapter<>::Run() at ??:0
> @ 0x7f844289797d kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
> @ 0x7f8442896fff kudu::internal::Invoker<>::Run() at ??:0
> @ 0x7f8442879e6f kudu::Callback<>::Run() at ??:0
> @ 0x7f844250fa1f kudu::log::Log::AppendThread::HandleGroup() at ??:0
> @ 0x7f844250ee5c kudu::log::Log::AppendThread::DoWork() at ??:0
> @ 0x7f8442527c81 kudu::internal::RunnableAdapter<>::Run() at ??:0
> @ 0x7f8442526773 kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
> @ 0x7f844252475a kudu::internal::Invoker<>::Run() at ??:0
> @ 0x7f844288b654 kudu::Callback<>::Run() at ??:0
> @ 0x7f843c4e9a10 kudu::ClosureRunnable::Run() at ??:0
> @ 0x7f843c4e8a73 kudu::ThreadPool::DispatchThread() at ??:0
> {noformat}
> A simple fix would be to treat failures in {{SnoozeFailureDetectors}} as
> non-fatal and stay in the do-while loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)