[jira] [Resolved] (IMPALA-4645) Fragments close after query is unregistered

Dan Hecht (JIRA) Wed, 23 Aug 2017 13:54:24 -0700

     [ 
https://issues.apache.org/jira/browse/IMPALA-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dan Hecht resolved IMPALA-4645.
-------------------------------
    Resolution: Not A Problem

> Fragments close after query is unregistered
> -------------------------------------------
>
>                 Key: IMPALA-4645
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4645
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Matthew Jacobs
>              Labels: query-lifecycle
>         Attachments: impalad-fragment-not-close.log, stacks.out
>
>
> Testing the latest Kudu master exposes an Impala issue where a query appears 
> to complete successfully even though a fragment continues to run. This can be 
> observed in  TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan when 
> using the latest version of Kudu in the minicluster 
> (IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e  and  IMPALA_KUDU_VERSION=e018a83 
> ). The test completes and the test table is deleted yet a fragment remains 
> afterwards. This then breaks the next test that runs (TestFragmentLifecycle) 
> which expects no fragments to be 'in flight'. See IMPALA-4642 for that test 
> failure.
> *This JIRA is not about Kudu changes or the test failure (tracked by 
> IMPALA-4642), but that fragment instances are closed after sending the 'done' 
> report (which means the query may unregister first).*
> In the repro, after the test completed, no queries were reported running but 
> 2 fragments are still in flight. The impalad web UI shows 
> "impala-server.num-fragments-in-flight: 2"
> I captured the stacks on the coordinator using gdb and see 2 fragment 
> execution threads, e.g.:
> {code}
> Thread 10 (Thread 0x7f27af541700 (LWP 14882)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x0000000001185b9b in boost::condition_variable::wait (this=0x12fbee58, 
> m=...) at 
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/pthread/condition_variable.hpp:73
> #2  0x0000000001a55a0c in boost::thread::join_noexcept() ()
> #3  0x00000000013316c1 in boost::thread::join (this=0x131214f0) at 
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:767
> #4  0x0000000001331d2e in impala::Thread::Join (this=0x11a6bd80) at 
> /home/mj/dev/Impala/be/src/util/thread.h:104
> #5  0x00000000015eecc4 in impala::ThreadGroup::JoinAll (this=0xc2516a0) at 
> /home/mj/dev/Impala/be/src/util/thread.cc:327
> #6  0x000000000178df9c in impala::KuduScanNode::Close (this=0xc251440, 
> state=0xcdd8e00) at /home/mj/dev/Impala/be/src/exec/kudu-scan-node.cc:209
> #7  0x00000000019e8e3d in impala::PlanFragmentExecutor::Close 
> (this=0xa76c2b0) at 
> /home/mj/dev/Impala/be/src/runtime/plan-fragment-executor.cc:502
> #8  0x00000000019e08fd in impala::FragmentInstanceState::Exec 
> (this=0xa76c000) at 
> /home/mj/dev/Impala/be/src/runtime/fragment-instance-state.cc:69
> #9  0x00000000019ebfd3 in impala::QueryExecMgr::ExecFInstance 
> (this=0xad09140, fis=0xa76c000) at 
> /home/mj/dev/Impala/be/src/runtime/query-exec-mgr.cc:100
> #10 0x00000000019ef51e in boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::FragmentInstanceState*>::operator() (this=0xb71c5e0, p=0xad09140, 
> a1=0xa76c000) at 
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #11 0x00000000019ef3c1 in 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, 
> boost::_bi::value<impala::FragmentInstanceState*> 
> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xb71c5f0, f=..., 
> a=...) at 
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
> ...
> {code}
> There are no coordinator threads as the query has completed. The log for 
> query id e4100828b01f01c:5a7c79e200000000 shows this as well.
> {code}
> I1212 11:10:26.125466  2681 impala-server.cc:907] UnregisterQuery(): 
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125478  2681 impala-server.cc:993] Cancel(): 
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125489  2681 coordinator.cc:1362] Cancel() 
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125504  2681 coordinator.cc:1428] CancelFragmentInstances() 
> query_id=e4100828b01f01c:5a7c79e200000000, tried to cancel 0 fragment 
> instances
> I1212 11:10:26.126605  2681 admission-controller.cc:538] Released query 
> id=e4100828b01f01c:5a7c79e200000000 agg_num_running=1, agg_num_queued=0, 
> agg_mem_reserved=32.63 MB,  local_host(local_mem_admitted=3.75 GB, 
> num_admitted_running=1, num_queued=0, backend_mem_reserved=10.21 MB)
> I1212 11:10:26.126647  2681 query-exec-mgr.cc:138] ReleaseQueryState(): 
> query_id=e4100828b01f01c:5a7c79e200000000 refcnt=2
> I1212 11:10:26.131104  2681 impala-hs2-server.cc:373] CloseSession(): 
> request=TCloseSessionReq {...}
> I1212 11:10:26.185884  2681 status.cc:114] Session closed
> {code}
> It's not until almost a minute later that a fragment instance from this query 
> completes:
> {code}
> I1212 11:11:03.275807 14882 query-exec-mgr.cc:103] Instance completed. 
> instance_id=e4100828b01f01c:5a7c79e200000003
> I1212 11:11:03.275813 14882 query-exec-mgr.cc:138] ReleaseQueryState(): 
> query_id=e4100828b01f01c:5a7c79e200000000 refcnt=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (IMPALA-4645) Fragments close after query is unregistered

Reply via email to