[
https://issues.apache.org/jira/browse/IMPALA-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dan Hecht resolved IMPALA-4645.
-------------------------------
Resolution: Not A Problem
> Fragments close after query is unregistered
> -------------------------------------------
>
> Key: IMPALA-4645
> URL: https://issues.apache.org/jira/browse/IMPALA-4645
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.8.0
> Reporter: Matthew Jacobs
> Labels: query-lifecycle
> Attachments: impalad-fragment-not-close.log, stacks.out
>
>
> Testing the latest Kudu master exposes an Impala issue where a query appears
> to complete successfully even though a fragment continues to run. This can be
> observed in TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan when
> using the latest version of Kudu in the minicluster
> (IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e and IMPALA_KUDU_VERSION=e018a83
> ). The test completes and the test table is deleted yet a fragment remains
> afterwards. This then breaks the next test that runs (TestFragmentLifecycle)
> which expects no fragments to be 'in flight'. See IMPALA-4642 for that test
> failure.
> *This JIRA is not about Kudu changes or the test failure (tracked by
> IMPALA-4642), but that fragment instances are closed after sending the 'done'
> report (which means the query may unregister first).*
> In the repro, after the test completed, no queries were reported running but
> 2 fragments are still in flight. The impalad web UI shows
> "impala-server.num-fragments-in-flight: 2"
> I captured the stacks on the coordinator using gdb and see 2 fragment
> execution threads, e.g.:
> {code}
> Thread 10 (Thread 0x7f27af541700 (LWP 14882)):
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1 0x0000000001185b9b in boost::condition_variable::wait (this=0x12fbee58,
> m=...) at
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/pthread/condition_variable.hpp:73
> #2 0x0000000001a55a0c in boost::thread::join_noexcept() ()
> #3 0x00000000013316c1 in boost::thread::join (this=0x131214f0) at
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:767
> #4 0x0000000001331d2e in impala::Thread::Join (this=0x11a6bd80) at
> /home/mj/dev/Impala/be/src/util/thread.h:104
> #5 0x00000000015eecc4 in impala::ThreadGroup::JoinAll (this=0xc2516a0) at
> /home/mj/dev/Impala/be/src/util/thread.cc:327
> #6 0x000000000178df9c in impala::KuduScanNode::Close (this=0xc251440,
> state=0xcdd8e00) at /home/mj/dev/Impala/be/src/exec/kudu-scan-node.cc:209
> #7 0x00000000019e8e3d in impala::PlanFragmentExecutor::Close
> (this=0xa76c2b0) at
> /home/mj/dev/Impala/be/src/runtime/plan-fragment-executor.cc:502
> #8 0x00000000019e08fd in impala::FragmentInstanceState::Exec
> (this=0xa76c000) at
> /home/mj/dev/Impala/be/src/runtime/fragment-instance-state.cc:69
> #9 0x00000000019ebfd3 in impala::QueryExecMgr::ExecFInstance
> (this=0xad09140, fis=0xa76c000) at
> /home/mj/dev/Impala/be/src/runtime/query-exec-mgr.cc:100
> #10 0x00000000019ef51e in boost::_mfi::mf1<void, impala::QueryExecMgr,
> impala::FragmentInstanceState*>::operator() (this=0xb71c5e0, p=0xad09140,
> a1=0xa76c000) at
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #11 0x00000000019ef3c1 in
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>,
> boost::_bi::value<impala::FragmentInstanceState*>
> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr,
> impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xb71c5f0, f=...,
> a=...) at
> /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
> ...
> {code}
> There are no coordinator threads as the query has completed. The log for
> query id e4100828b01f01c:5a7c79e200000000 shows this as well.
> {code}
> I1212 11:10:26.125466 2681 impala-server.cc:907] UnregisterQuery():
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125478 2681 impala-server.cc:993] Cancel():
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125489 2681 coordinator.cc:1362] Cancel()
> query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125504 2681 coordinator.cc:1428] CancelFragmentInstances()
> query_id=e4100828b01f01c:5a7c79e200000000, tried to cancel 0 fragment
> instances
> I1212 11:10:26.126605 2681 admission-controller.cc:538] Released query
> id=e4100828b01f01c:5a7c79e200000000 agg_num_running=1, agg_num_queued=0,
> agg_mem_reserved=32.63 MB, local_host(local_mem_admitted=3.75 GB,
> num_admitted_running=1, num_queued=0, backend_mem_reserved=10.21 MB)
> I1212 11:10:26.126647 2681 query-exec-mgr.cc:138] ReleaseQueryState():
> query_id=e4100828b01f01c:5a7c79e200000000 refcnt=2
> I1212 11:10:26.131104 2681 impala-hs2-server.cc:373] CloseSession():
> request=TCloseSessionReq {...}
> I1212 11:10:26.185884 2681 status.cc:114] Session closed
> {code}
> It's not until almost a minute later that a fragment instance from this query
> completes:
> {code}
> I1212 11:11:03.275807 14882 query-exec-mgr.cc:103] Instance completed.
> instance_id=e4100828b01f01c:5a7c79e200000003
> I1212 11:11:03.275813 14882 query-exec-mgr.cc:138] ReleaseQueryState():
> query_id=e4100828b01f01c:5a7c79e200000000 refcnt=1
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)