[
https://issues.apache.org/jira/browse/KUDU-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sailesh Mukil resolved KUDU-1719.
---------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
Commit in:
https://github.com/apache/kudu/commit/2426ef30650318bafd72cc49984aa7b5719b58e0
> Heap use-after-free and deadlock on Messenger::Init() failure
> -------------------------------------------------------------
>
> Key: KUDU-1719
> URL: https://issues.apache.org/jira/browse/KUDU-1719
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.0.1
> Reporter: Sailesh Mukil
> Labels: easyfix
> Fix For: 1.1.0
>
>
> {code}
> Status MessengerBuilder::Build(shared_ptr<Messenger> *msgr) {
> ...
> gscoped_ptr<Messenger> new_msgr(new Messenger(*this));
> Status build_status = new_msgr->Init();
> if (!build_status.ok()) {
> new_msgr->AllExternalReferencesDropped();
> return build_status;
> }
> ...
> }
> {code}
> When new_msgr->Init() fails, AllExternalReferencesDropped() will be called
> which does a retain_self_.reset() which is a shared pointer to itself. If
> that was the last shared pointer reference to itself, it will call the
> destructor ~Messenger(). However, when the gscoped pointer 'new_msgr' goes
> out of scope, it too will call the destructor, but by this point, all the
> members have already been freed causing a use-after-free.
> ----
> When Messenger::Init() fails before all the reactors have been Init()'ed, the
> reactors that haven't been Init()'ed do not give up their reference to the
> Messenger object. This is because this reference is only given up in
> ReactorThread::RunThread(), which is not called unless reactor->Init() is
> called:
> {code}
> void ReactorThread::RunThread() {
> ...
> DVLOG(6) << "Calling ReactorThread::RunThread()...";
> loop_.run(0);
> VLOG(1) << name() << " thread exiting.";
> // No longer need the messenger. This causes the messenger to
> // get deleted when all the reactors exit.
> reactor_->messenger_.reset();
> }
> {code}
> So when the gscoped_ptr 'new_msgr' goes out of scope, it will call the
> destructor which calls STLDeleteElements(&reactors_); where the Messenger
> references are finally dropped. But when the last Messenger reference is
> dropped, that too will call the destructor ~Messenger() which gets stuck on
> the lock:
> {code}
> Messenger::~Messenger() {
> std::lock_guard<percpu_rwlock> guard(lock_); // Gets stuck on this lock
> CHECK(closing_) << "Should have already shut down";
> STLDeleteElements(&reactors_); // This will call the ~Messenger()
> destructor again
> }
> {code}
> The easy fix for this is to make the gscoped_ptr 'new_msgr' a raw pointer
> instead, and not explicitly free it since it will be freed by the shared
> pointers pointing to it. (But this may look like buggy code, not sure if
> there's a better way)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)