[ 
https://issues.apache.org/jira/browse/IMPALA-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877363#comment-16877363
 ] 

Joe McDonnell commented on IMPALA-8732:
---------------------------------------

There are a few different ways to fix this.

The pstacks indicate that there is one thread in 
tcmalloc::PageHeap()::AllocLarge() (which is holding a lock) with dozens more 
blocked on the lock. tcmalloc has an O( n ) algorithm for 
PageHeap()::AllocLarge(), which may get worse as memory gets fragmented. More 
recent versions of gperftools have changed this to O(log n) behavior. One 
option is to patch gperftools with this set of patches. I filed IMPALA-8737 to 
track this type of change. IMPALA-6784 tracks upgrading gperftools to a more 
recent release. This is a more substantial change that would require 
performance testing.

A second option is to modify the thrift structures so that TDescriptorTable is 
serialized in the frontend and remains serialized during 
Coordinator::BackendState::Exec() all the way through to the executors. This 
means that Coordinator::BackendState::Exec() would be copying a single block of 
memory rather than a map of nested structs. This would involve significantly 
fewer memory allocations and should avoid the contention in tcmalloc. This Jira 
tracks that change.

> tcmalloc contention in Coordinator::BackendState::Exec()
> --------------------------------------------------------
>
>                 Key: IMPALA-8732
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8732
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.12.0, Impala 3.3.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> A cluster with several dozen nodes running Impala 2.12 observed slow query 
> startup:
> {noformat}
>     Query Timeline
>       Query submitted: 63.19us (63194)
>       Planning finished: 815ms (815243445)
>       Submit for admission: 1.21s (1206292509)
>       Completed admission: 1.21s (1206462581)
>       Ready to start on 83 backends: 1.40s (1395408016) <------
>       All 83 execution backends (84 fragment instances) started: 3.3m 
> (196968242656) <-------
>       Rows available: 3.3m (197012769471)
>       Cancelled: 3.3m (197012953528)
>       Released admission control resources: 3.3m (197029124587)
>       Unregister query: 3.3m (197241887354){noformat}
> This corresponds to the time spent in Coordinator::StartBackendExec() where 
> the coordinator submits the ExecQueryFInstances messages to the 
> ExecEnv::exec_rpc_thread_pool_ and waits for the thread pool to send the 
> messages via Coordinator::BackendState:;Exec(). This time did not show up in 
> the backend startup latency:
> {noformat}
> Backend startup latencies: Count: 83, min / max: 1ms / 619ms, 25th %-ile: 
> 2ms, 50th %-ile: 2ms, 75th %-ile: 3ms, 90th %-ile: 4ms, 95th %-ile: 8ms, 
> 99.9th %-ile: 619ms{noformat}
> This means that the time is spent on the coordinator side, and it is not 
> related to RPCs.
> Examining the logs from the affected Impala coordinator shows a backlog on 
> the ExecEnv::exec_rpc_thread_pool_. Specifically, I processed the logs 
> looking for "starting execution on X backends for query_id=" messages and 
> their corresponding "started execution on X backends for query_id="  messages 
> and tracked the number of outstanding starting queries. There was no burst of 
> query activity; the number of outstanding starting queries accumulated over 
> the course of a minute, reaching 30 at times.
> I wrote a script to track the number of outstanding starting queries in the 
> real time and dump the Impalad's pstack if the problem showed up (defined as 
> number of starting queries > 10).
> This revealed contention on tcmalloc in Coordinator::BackendState::Exec() 
> when copying the TQueryCtx (specifically when copying a map of 
> THdfsPartitions and their corresponding TExpr/TExprNode). The pstacks showed 
> stacks like these:
> {noformat}
> #0  0x000000000211519a in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #1  0x0000000002160b69 in SpinLock::SlowLock() ()
> #2  0x00000000021a4568 in tcmalloc::CentralFreeList::Populate() ()
> #3  0x00000000021a4668 in 
> tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) ()
> #4  0x00000000021a46f4 in tcmalloc::CentralFreeList::RemoveRange(void**, 
> void**, int) ()
> #5  0x00000000021b1e53 in 
> tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
> #6  0x0000000002258978 in tc_newarray ()
> #7  0x00000000009644e7 in impala::TExpr* 
> std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<impala::TExpr
>  const*, std::vector<impala::TExpr, std::allocator<impala::TExpr> > >, 
> impala::TExpr*>(__gnu_cxx::__normal_iterator<impala::TExpr const*, 
> std::vector<impala::TExpr, std::allocator<impala::TExpr> > >, 
> __gnu_cxx::__normal_iterator<impala::TExpr const*, std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> > >, impala::TExpr*) ()
> #8  0x000000000096509a in 
> _ZNSt8_Rb_treeIlSt4pairIKlN6impala14THdfsPartitionEESt10_Select1stIS4_ESt4lessIlESaIS4_EE14_M_create_nodeIJRKS4_EEEPSt13_Rb_tree_nodeIS4_EDpOT_
>  ()
> #9  0x00000000009654e3 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #10 0x0000000000965513 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #11 0x0000000000965513 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #12 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #13 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #14 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #15 0x000000000096556b in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > >::_M_copy(std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> > const*, std::_Rb_tree_node<std::pair<long 
> const, impala::THdfsPartition> >*) ()
> #16 0x0000000000a2f59b in impala::THdfsTable::THdfsTable(impala::THdfsTable 
> const&) ()
> #17 0x0000000000af53c0 in 
> impala::TTableDescriptor::TTableDescriptor(impala::TTableDescriptor const&) ()
> #18 0x0000000000c7ccd0 in std::vector<impala::TTableDescriptor, 
> std::allocator<impala::TTableDescriptor> 
> >::operator=(std::vector<impala::TTableDescriptor, 
> std::allocator<impala::TTableDescriptor> > const&) ()
> #19 0x000000000126eb71 in 
> impala::Coordinator::BackendState::Exec(impala::TQueryCtx const&, 
> impala::DebugOptions const&, boost::unordered::unordered_map<int, 
> impala::Coordinator::FilterState, boost::hash<int>, std::equal_to<int>, 
> std::allocator<std::pair<int const, impala::Coordinator::FilterState> > > 
> const&, impala::CountingBarrier*) (){noformat}
> and
> {noformat}
> #0  0x000000000211519a in base::internal::SpinLockDelay(int volatile*, int, 
> int) ()
> #1  0x0000000002160b69 in SpinLock::SlowLock() ()
> #2  0x00000000021a3fd8 in tcmalloc::CentralFreeList::ReleaseToSpans(void*) ()
> #3  0x00000000021a401b in 
> tcmalloc::CentralFreeList::ReleaseListToSpans(void*) ()
> #4  0x00000000021a428c in tcmalloc::CentralFreeList::InsertRange(void*, 
> void*, int) ()
> #5  0x00000000021b1ff4 in 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int) ()
> #6  0x00000000021b20bc in 
> tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
> long) ()
> #7  0x0000000002256c10 in tc_free ()
> #8  0x00000000009520b1 in std::vector<impala::TExpr, 
> std::allocator<impala::TExpr> >::~vector() ()
> #9  0x0000000000953823 in impala::THdfsPartition::~THdfsPartition() ()
> #10 0x0000000000954709 in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #11 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #12 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #13 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #14 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #15 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #16 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #17 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #18 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #19 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #20 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #21 0x00000000009546fc in std::_Rb_tree<long, std::pair<long const, 
> impala::THdfsPartition>, std::_Select1st<std::pair<long const, 
> impala::THdfsPartition> >, std::less<long>, std::allocator<std::pair<long 
> const, impala::THdfsPartition> > 
> >::_M_erase(std::_Rb_tree_node<std::pair<long const, impala::THdfsPartition> 
> >*) ()
> #22 0x00000000009547ff in impala::THdfsTable::~THdfsTable() ()
> #23 0x0000000000af4544 in impala::TTableDescriptor::~TTableDescriptor() ()
> #24 0x0000000000bc575f in impala::TDescriptorTable::~TDescriptorTable() ()
> #25 0x0000000000bc59b3 in impala::TQueryCtx::~TQueryCtx() ()
> #26 0x0000000000bc6158 in 
> impala::TExecQueryFInstancesParams::~TExecQueryFInstancesParams() ()
> #27 0x000000000126f68f in 
> impala::Coordinator::BackendState::Exec(impala::TQueryCtx const&, 
> impala::DebugOptions const&, boost::unordered::unordered_map<int, 
> impala::Coordinator::FilterState, boost::hash<int>, std::equal_to<int>, 
> std::allocator<std::pair<int const, impala::Coordinator::FilterState> > > 
> const&, impala::CountingBarrier*) (){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to