[
https://issues.apache.org/jira/browse/IMPALA-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dan Hecht resolved IMPALA-5202.
-------------------------------
Resolution: Fixed
Fix Version/s: Impala 3.1.0
commit 94ffcc64a997022ed582c2b428aac1f06dd2da77
Author: Dan Hecht <[email protected]>
Date: Wed Jun 20 14:49:57 2018 -0700
IMPALA-5202: Disallow PREPARE:WAIT debug action
In order to simplify FIS startup, we don't allow cancellation until all
FIS have finished Prepare(), so we shouldn't allow PREPARE:WAIT since
there will be no way to cancel out of the loop. Make this explicit.
Change-Id: I1caa4f8e6ce7f32a8a3722648e08e24f34dba35d
Reviewed-on: http://gerrit.cloudera.org:8080/10776
Reviewed-by: Dan Hecht <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Debug action WAIT in PREPARE leads to hung query that cannot be cancelled.
> --------------------------------------------------------------------------
>
> Key: IMPALA-5202
> URL: https://issues.apache.org/jira/browse/IMPALA-5202
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Infrastructure
> Affects Versions: Impala 2.8.0
> Reporter: Alexander Behm
> Assignee: Dan Hecht
> Priority: Trivial
> Fix For: Impala 3.1.0
>
> Attachments: stacks.txt.gz
>
>
> I believe recent changes to coordination and distributed execution have
> broken the WAIT debug action when called in some phases, e.g. PREPARE.
> The following repro leads to a hung query that cannot be cancelled. Impala's
> WebUI hangs, so cannot cancel from there either.
> {code}
> set debug_action="0:PREPARE:WAIT";
> select 1 from functional.alltypes;
> {code}
> I tried WAIT in PREPARE with other simple queries, targeting other exec nodes
> (e.g., top-n) with the same result.
> I am not sure why our test_failpoints.py or test_cancellation.py did not
> catch this.
> Attached:
> I ran an experiment with a single impalad. I ran the above sequence and then
> issued a ctrl+c from the impala shell to cancel the query. At that point, I
> collected the stacks of all threads.
> Interesting stacks:
> {code}
> Thread 3 (Thread 0x7fda1238c700 (LWP 8872)):
> #0 0x00007fda9b97183d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007fda9b9716dc in sleep () from /lib/x86_64-linux-gnu/libc.so.6
> #2 0x00000000016ab891 in impala::ExecNode::ExecDebugAction (this=0xc965800,
> phase=impala::TExecNodePhase::PREPARE, state=0xc965100) at
> /home/abehm/impala/be/src/exec/exec-node.cc:430
> #3 0x00000000016a8c4c in impala::ExecNode::Prepare (this=0xc965800,
> state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:148
> #4 0x00000000017ee510 in impala::ScanNode::Prepare (this=0xc965800,
> state=0xc965100) at /home/abehm/impala/be/src/exec/scan-node.cc:51
> #5 0x00000000016dc14b in impala::HdfsScanNodeBase::Prepare (this=0xc965800,
> state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node-base.cc:175
> #6 0x00000000016d3516 in impala::HdfsScanNode::Prepare (this=0xc965800,
> state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node.cc:167
> #7 0x0000000001a716d1 in impala::PlanFragmentExecutor::PrepareInternal
> (this=0xc9645d0, qs=0x9382800, tdesc_tbl=..., fragment_ctx=...,
> instance_ctx=...) at
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:215
> #8 0x0000000001a6fd69 in impala::PlanFragmentExecutor::Prepare
> (this=0xc9645d0, query_state=0x9382800, desc_tbl=..., fragment_ctx=...,
> instance_ctx=...) at
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:99
> #9 0x0000000001a6cce5 in impala::FragmentInstanceState::Exec
> (this=0xc964300) at
> /home/abehm/impala/be/src/runtime/fragment-instance-state.cc:64
> #10 0x0000000001a783d1 in impala::QueryExecMgr::ExecFInstance
> (this=0xb870ba0, fis=0xc964300) at
> /home/abehm/impala/be/src/runtime/query-exec-mgr.cc:110
> #11 0x0000000001a7b1fa in boost::_mfi::mf1<void, impala::QueryExecMgr,
> impala::FragmentInstanceState*>::operator() (this=0xac8ce60, p=0xb870ba0,
> a1=0xc964300) at
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165
> #12 0x0000000001a7b083 in
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>,
> boost::_bi::value<impala::FragmentInstanceState*>
> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr,
> impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xac8ce70, f=...,
> a=...) at
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313
> {code}
> {code}
> Thread 2 (Thread 0x7fda10b89700 (LWP 8874)):
> #0 0x00007fda9bc7cd84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00000000011c1f6d in boost::condition_variable::wait (this=0xc962be0,
> m=...) at
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73
> #2 0x000000000133caf7 in impala::Promise<impala::Status>::Get
> (this=0xc962be0) at /home/abehm/impala/be/src/util/promise.h:67
> #3 0x0000000001a6ff70 in impala::PlanFragmentExecutor::WaitForOpen
> (this=0xc9629d0) at
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:108
> #4 0x0000000001a38e2f in impala::Coordinator::Wait (this=0xbe72d00) at
> /home/abehm/impala/be/src/runtime/coordinator.cc:1063
> #5 0x000000000152be3c in impala::ImpalaServer::QueryExecState::WaitInternal
> (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:666
> #6 0x000000000152b960 in impala::ImpalaServer::QueryExecState::Wait
> (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:634
> #7 0x0000000001547643 in boost::_mfi::mf0<void,
> impala::ImpalaServer::QueryExecState>::operator() (this=0x7fda10b88d78,
> p=0x972ac00) at
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:49
> #8 0x0000000001547260 in
> boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*>
> >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>,
> boost::_bi::list0> (this=0x7fda10b88d88, f=..., a=...) at
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:253
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)