[
https://issues.apache.org/jira/browse/KUDU-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783880#comment-16783880
]
Will Berkeley commented on KUDU-2720:
-------------------------------------
A different stack illustrating a service queue overflow caused by contention in
the ResultTracker:
{noformat}
Stacks at 0301 13:48:00.422678 (service queue overflowed for
kudu.tserver.TabletServerService):
tids=[3063]
0x379ba0f710 <unknown>
0x1fb951a base::internal::SpinLockDelay()
0x1fb93b7 base::SpinLock::SlowLock()
0x1e12070 kudu::rpc::ResultTracker::IsCurrentDriver()
0xaab426 kudu::tablet::TransactionDriver::Prepare()
0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
0x1fa37ed kudu::ThreadPool::DispatchThread()
0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
tids=[22185,22194,22193,22192,22191,22190,22186,22187,22189]
0x379ba0f710 <unknown>
0x1fb951a base::internal::SpinLockDelay()
0x1fb93b7 base::SpinLock::SlowLock()
0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
0x1e2986a kudu::rpc::ServicePool::RunThread()
0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
{noformat}
> Improve concurrency of ResultTracker
> ------------------------------------
>
> Key: KUDU-2720
> URL: https://issues.apache.org/jira/browse/KUDU-2720
> Project: Kudu
> Issue Type: Improvement
> Affects Versions: 1.10.0
> Reporter: Will Berkeley
> Priority: Major
>
> Running a workload that's pushing many small batches from many clients, I see
> a lot of contention on the spinlock in the ResultTracker:
> {noformat}
> Stacks at 0228 14:19:29.339088 (service queue overflowed for
> kudu.tserver.TabletServerService):
> tids=[17223]
> 0x379ba0f710 <unknown>
> 0x89ee80 <unknown>
> 0x1fb8f72 base::internal::SpinLockDelay()
> 0x1fb8ea7 base::SpinLock::SlowLock()
> 0x1e138dc kudu::rpc::ResultTracker::TrackRpc()
> 0x1e289e5 kudu::rpc::GeneratedServiceIf::Handle()
> 0x1e2935a kudu::rpc::ServicePool::RunThread()
> 0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> ...
> tids=[5695,5673]
> 0x379ba0f710 <unknown>
> 0x1fb900a base::internal::SpinLockDelay()
> 0x1fb8ea7 base::SpinLock::SlowLock()
> 0x1e11b60 kudu::rpc::ResultTracker::IsCurrentDriver()
> 0xaaaf16 kudu::tablet::TransactionDriver::Prepare()
> 0xaabbdd kudu::tablet::TransactionDriver::PrepareTask()
> 0x1fa32dd kudu::ThreadPool::DispatchThread()
> 0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>
> tids=[5689,5696,5693,5692,5691,5690,5698,5688,5681,5682,5683,5685,5686,5687,5700,5669,5668,5667,5714,5704,5703,5702,5701,5697,5670,5665,5699,5664,5671,5672,5680]
> 0x379ba0f710 <unknown>
> 0x1fb900a base::internal::SpinLockDelay()
> 0x1fb8ea7 base::SpinLock::SlowLock()
> 0x1e11bcc kudu::rpc::ResultTracker::RecordCompletionAndRespond()
> 0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
> 0xaad024 kudu::tablet::TransactionDriver::Finalize()
> 0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
> 0x1fa32dd kudu::ThreadPool::DispatchThread()
> 0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> The lock in this case is being held by
> {noformat}
> tids=[5679]
> 0x379ba0f710 <unknown>
> 0x212f81b google::protobuf::Message::SpaceUsedLong()
> 0x1e11f2f kudu::rpc::ResultTracker::RecordCompletionAndRespond()
> 0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
> 0xaad024 kudu::tablet::TransactionDriver::Finalize()
> 0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
> 0x1fa32dd kudu::ThreadPool::DispatchThread()
> 0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> KUDU-1622 contained some suggestions for improving the ResultTracker. Some
> were implemented, but maybe we should consider implementing other suggestions
> there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)