[
https://issues.apache.org/jira/browse/KUDU-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012919#comment-16012919
]
Todd Lipcon commented on KUDU-1294:
-----------------------------------
[~aserbin] hit a similar issue which caused use-after-free due to the same
underlying problem. I looked into it and I think the issue is the following:
When the last transaction finishes, it runs TransactionTracker::Release()
{code}
void TransactionTracker::Release(TransactionDriver* driver) {
DecrementCounters(*driver);
State st;
{
// Remove the transaction from the map, retaining the state for use
// below.
std::lock_guard<simple_spinlock> l(lock_);
st = FindOrDie(pending_txns_, driver);
if (PREDICT_FALSE(pending_txns_.erase(driver) != 1)) {
LOG(FATAL) << "Could not remove pending transaction from map: "
<< driver->ToStringUnlocked();
}
}
if (mem_tracker_) {
mem_tracker_->Release(st.memory_footprint);
}
}
{code}
This removes from the map before it releases from mem_tracker_.
However, the TabletReplica::Delete path has a sequence like:
{code}
// TODO: KUDU-183: Keep track of the pending tasks and send an "abort"
message.
LOG_SLOW_EXECUTION(WARNING, 1000,
Substitute("TabletReplica: tablet $0: Waiting for Transactions to
complete", tablet_id())) {
txn_tracker_.WaitForAllToFinish();
}
...
// Only mark the peer as SHUTDOWN when all other components have shut down.
{
std::lock_guard<simple_spinlock> lock(lock_);
// Release mem tracker resources.
consensus_.reset();
tablet_.reset();
state_ = SHUTDOWN;
}
{code}
i.e it is using the "WaitForAllToFinish" as a sort of barrier to make sure
there are no more transactions running. However, "WaitForAllToFinish" is just
waiting for pending_txns_ to be empty.
So, we can hit the interleaving:
- T1: a transaction removes itself from pending_txns_
- T2: DeleteReplica returns from WaitForAllToFinish(), and then deletes the
TabletReplica, which deletes TransactionTracker
-- gets "unreleased consumption" because T1 hasn't yet continued to call
memtracker->Release()
If we disable the memtracker for the test, we get use-after-free instead
because it calls 'if (mem_tracker_)' on a now-destructed TransactionTracker
instance
> CHECK failure on TransactionTracker memtracker with unreleased consumption
> --------------------------------------------------------------------------
>
> Key: KUDU-1294
> URL: https://issues.apache.org/jira/browse/KUDU-1294
> Project: Kudu
> Issue Type: Bug
> Components: tablet
> Affects Versions: 0.6.0
> Reporter: Todd Lipcon
>
> {code}
> DeleteTableTest.TestDeleteTableWithConcurrentWrites: mem_tracker.cc:187]
> Check failed: consumption() == 0 Memory tracker
> txn_tracker->tablet-9446603547684183ba2053888b40696f->server->root has
> unreleased consumption 2400
> @ 0x7effe68a03b8 kudu::MemTracker::~MemTracker() at ??:0
> @ 0x7effe68a3acc std::_Sp_counted_ptr<>::_M_dispose() at ??:0
> @ 0x7effe9f74911 std::_Sp_counted_base<>::_M_release() at ??:0
> @ 0x7effe9f748c7 std::__shared_count<>::~__shared_count() at ??:0
> @ 0x7effe9fafdee std::__shared_ptr<>::~__shared_ptr() at ??:0
> @ 0x7effe93b8531
> kudu::tablet::TransactionTracker::~TransactionTracker() at ??:0
> @ 0x7effe93a614b kudu::tablet::TabletPeer::~TabletPeer() at ??:0
> @ 0x7effe93a62ea kudu::tablet::TabletPeer::~TabletPeer() at ??:0
> @ 0x7effe9f89e48 kudu::RefCountedThreadSafe<>::DeleteInternal() at
> ??:0
> @ 0x7effe9f89e0a
> kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0
> @ 0x7effe9f89dda kudu::RefCountedThreadSafe<>::Release() at ??:0
> @ 0x7effe9f8743b scoped_refptr<>::~scoped_refptr() at ??:0
> @ 0x7effe9fd4f42 kudu::tserver::TSTabletManager::DeleteTablet() at
> ??:0
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)