[ https://issues.apache.org/jira/browse/IMPALA-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dan Hecht resolved IMPALA-5202. ------------------------------- Resolution: Fixed Fix Version/s: Impala 3.1.0 commit 94ffcc64a997022ed582c2b428aac1f06dd2da77 Author: Dan Hecht <dhe...@cloudera.com> Date: Wed Jun 20 14:49:57 2018 -0700 IMPALA-5202: Disallow PREPARE:WAIT debug action In order to simplify FIS startup, we don't allow cancellation until all FIS have finished Prepare(), so we shouldn't allow PREPARE:WAIT since there will be no way to cancel out of the loop. Make this explicit. Change-Id: I1caa4f8e6ce7f32a8a3722648e08e24f34dba35d Reviewed-on: http://gerrit.cloudera.org:8080/10776 Reviewed-by: Dan Hecht <dhe...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > Debug action WAIT in PREPARE leads to hung query that cannot be cancelled. > -------------------------------------------------------------------------- > > Key: IMPALA-5202 > URL: https://issues.apache.org/jira/browse/IMPALA-5202 > Project: IMPALA > Issue Type: Bug > Components: Backend, Infrastructure > Affects Versions: Impala 2.8.0 > Reporter: Alexander Behm > Assignee: Dan Hecht > Priority: Trivial > Fix For: Impala 3.1.0 > > Attachments: stacks.txt.gz > > > I believe recent changes to coordination and distributed execution have > broken the WAIT debug action when called in some phases, e.g. PREPARE. > The following repro leads to a hung query that cannot be cancelled. Impala's > WebUI hangs, so cannot cancel from there either. > {code} > set debug_action="0:PREPARE:WAIT"; > select 1 from functional.alltypes; > {code} > I tried WAIT in PREPARE with other simple queries, targeting other exec nodes > (e.g., top-n) with the same result. > I am not sure why our test_failpoints.py or test_cancellation.py did not > catch this. > Attached: > I ran an experiment with a single impalad. I ran the above sequence and then > issued a ctrl+c from the impala shell to cancel the query. At that point, I > collected the stacks of all threads. > Interesting stacks: > {code} > Thread 3 (Thread 0x7fda1238c700 (LWP 8872)): > #0 0x00007fda9b97183d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00007fda9b9716dc in sleep () from /lib/x86_64-linux-gnu/libc.so.6 > #2 0x00000000016ab891 in impala::ExecNode::ExecDebugAction (this=0xc965800, > phase=impala::TExecNodePhase::PREPARE, state=0xc965100) at > /home/abehm/impala/be/src/exec/exec-node.cc:430 > #3 0x00000000016a8c4c in impala::ExecNode::Prepare (this=0xc965800, > state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:148 > #4 0x00000000017ee510 in impala::ScanNode::Prepare (this=0xc965800, > state=0xc965100) at /home/abehm/impala/be/src/exec/scan-node.cc:51 > #5 0x00000000016dc14b in impala::HdfsScanNodeBase::Prepare (this=0xc965800, > state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node-base.cc:175 > #6 0x00000000016d3516 in impala::HdfsScanNode::Prepare (this=0xc965800, > state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node.cc:167 > #7 0x0000000001a716d1 in impala::PlanFragmentExecutor::PrepareInternal > (this=0xc9645d0, qs=0x9382800, tdesc_tbl=..., fragment_ctx=..., > instance_ctx=...) at > /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:215 > #8 0x0000000001a6fd69 in impala::PlanFragmentExecutor::Prepare > (this=0xc9645d0, query_state=0x9382800, desc_tbl=..., fragment_ctx=..., > instance_ctx=...) at > /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:99 > #9 0x0000000001a6cce5 in impala::FragmentInstanceState::Exec > (this=0xc964300) at > /home/abehm/impala/be/src/runtime/fragment-instance-state.cc:64 > #10 0x0000000001a783d1 in impala::QueryExecMgr::ExecFInstance > (this=0xb870ba0, fis=0xc964300) at > /home/abehm/impala/be/src/runtime/query-exec-mgr.cc:110 > #11 0x0000000001a7b1fa in boost::_mfi::mf1<void, impala::QueryExecMgr, > impala::FragmentInstanceState*>::operator() (this=0xac8ce60, p=0xb870ba0, > a1=0xc964300) at > /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165 > #12 0x0000000001a7b083 in > boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, > boost::_bi::value<impala::FragmentInstanceState*> > >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, > impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xac8ce70, f=..., > a=...) at > /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313 > {code} > {code} > Thread 2 (Thread 0x7fda10b89700 (LWP 8874)): > #0 0x00007fda9bc7cd84 in pthread_cond_wait@@GLIBC_2.3.2 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00000000011c1f6d in boost::condition_variable::wait (this=0xc962be0, > m=...) at > /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73 > #2 0x000000000133caf7 in impala::Promise<impala::Status>::Get > (this=0xc962be0) at /home/abehm/impala/be/src/util/promise.h:67 > #3 0x0000000001a6ff70 in impala::PlanFragmentExecutor::WaitForOpen > (this=0xc9629d0) at > /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:108 > #4 0x0000000001a38e2f in impala::Coordinator::Wait (this=0xbe72d00) at > /home/abehm/impala/be/src/runtime/coordinator.cc:1063 > #5 0x000000000152be3c in impala::ImpalaServer::QueryExecState::WaitInternal > (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:666 > #6 0x000000000152b960 in impala::ImpalaServer::QueryExecState::Wait > (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:634 > #7 0x0000000001547643 in boost::_mfi::mf0<void, > impala::ImpalaServer::QueryExecState>::operator() (this=0x7fda10b88d78, > p=0x972ac00) at > /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:49 > #8 0x0000000001547260 in > boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> > >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, > boost::_bi::list0> (this=0x7fda10b88d88, f=..., a=...) at > /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:253 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)