Will Berkeley created KUDU-2708:
-----------------------------------
Summary: Possible contention creating temporary files while
flushing cmeta during an election storm
Key: KUDU-2708
URL: https://issues.apache.org/jira/browse/KUDU-2708
Project: Kudu
Issue Type: Improvement
Reporter: Will Berkeley
Doing investigation into consensus queue overflows that happen under heavy
write load, I noticed 6/10 service threads at the time of overflow have stacks
like
{noformat}
0x3b6720f710 <unknown>
0x1fb900a base::internal::SpinLockDelay()
0x1fb8ea7 base::SpinLock::SlowLock()
0xb82e25 kudu::consensus::RaftConsensus::RequestVote()
0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
0x1e2935a kudu::rpc::ServicePool::RunThread()
0x1f9bd91 kudu::Thread::SuperviseThread()
0x3b672079d1 start_thread
0x3b66ee88fd clone
{noformat}
They are waiting on some tablet's Raft consensus instance's {{lock_}} in order
to vote. Looking into what might be holding that lock, I see stacks like
{noformat}
0x3b6720f710 <unknown>
0x3b66edb2ed __GI_open64
0x3b66e63caa __gen_tempname
0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile()
0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile()
0x1f8305e kudu::pb_util::WritePBContainerToPath()
0xb47932 kudu::consensus::ConsensusMetadata::Flush()
0xb74164
kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked()
0xb783aa
kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted()
0xb836a1 kudu::consensus::RaftConsensus::RequestVote()
0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
0x1e2935a kudu::rpc::ServicePool::RunThread()
0x1f9bd91 kudu::Thread::SuperviseThread()
0x3b672079d1 start_thread
0x3b66ee88fd clone
{noformat}
Doing some junior spelunking into glibc code, one hypothesis is that we are
generating lots of collisions of proposed temporary file names in the cmeta
folder because many threads are attempting to flush cmeta at once. The glibc
code looks like
Maybe we could put the thread id into the temporary file name when a thread
does a cmeta flush.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)