[ 
https://issues.apache.org/jira/browse/IMPALA-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Hecht resolved IMPALA-5202.
-------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.1.0

commit 94ffcc64a997022ed582c2b428aac1f06dd2da77
Author: Dan Hecht <dhe...@cloudera.com>
Date:   Wed Jun 20 14:49:57 2018 -0700

    IMPALA-5202: Disallow PREPARE:WAIT debug action

    In order to simplify FIS startup, we don't allow cancellation until all
    FIS have finished Prepare(), so we shouldn't allow PREPARE:WAIT since
    there will be no way to cancel out of the loop.  Make this explicit.

    Change-Id: I1caa4f8e6ce7f32a8a3722648e08e24f34dba35d
    Reviewed-on: http://gerrit.cloudera.org:8080/10776
    Reviewed-by: Dan Hecht <dhe...@cloudera.com>
    Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

> Debug action WAIT in PREPARE leads to hung query that cannot be cancelled.
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-5202
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5202
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Infrastructure
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Assignee: Dan Hecht
>            Priority: Trivial
>             Fix For: Impala 3.1.0
>
>         Attachments: stacks.txt.gz
>
>
> I believe recent changes to coordination and distributed execution have 
> broken the WAIT debug action when called in some phases, e.g. PREPARE.
> The following repro leads to a hung query that cannot be cancelled. Impala's 
> WebUI hangs, so cannot cancel from there either.
> {code}
> set debug_action="0:PREPARE:WAIT";
> select 1 from functional.alltypes;
> {code}
> I tried WAIT in PREPARE with other simple queries, targeting other exec nodes 
> (e.g., top-n) with the same result.
> I am not sure why our test_failpoints.py or test_cancellation.py did not 
> catch this.
> Attached:
> I ran an experiment with a single impalad. I ran the above sequence and then 
> issued a ctrl+c from the impala shell to cancel the query. At that point, I 
> collected the stacks of all threads.
> Interesting stacks:
> {code}
> Thread 3 (Thread 0x7fda1238c700 (LWP 8872)):
> #0  0x00007fda9b97183d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fda9b9716dc in sleep () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00000000016ab891 in impala::ExecNode::ExecDebugAction (this=0xc965800, 
> phase=impala::TExecNodePhase::PREPARE, state=0xc965100) at 
> /home/abehm/impala/be/src/exec/exec-node.cc:430
> #3  0x00000000016a8c4c in impala::ExecNode::Prepare (this=0xc965800, 
> state=0xc965100) at /home/abehm/impala/be/src/exec/exec-node.cc:148
> #4  0x00000000017ee510 in impala::ScanNode::Prepare (this=0xc965800, 
> state=0xc965100) at /home/abehm/impala/be/src/exec/scan-node.cc:51
> #5  0x00000000016dc14b in impala::HdfsScanNodeBase::Prepare (this=0xc965800, 
> state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node-base.cc:175
> #6  0x00000000016d3516 in impala::HdfsScanNode::Prepare (this=0xc965800, 
> state=0xc965100) at /home/abehm/impala/be/src/exec/hdfs-scan-node.cc:167
> #7  0x0000000001a716d1 in impala::PlanFragmentExecutor::PrepareInternal 
> (this=0xc9645d0, qs=0x9382800, tdesc_tbl=..., fragment_ctx=..., 
> instance_ctx=...) at 
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:215
> #8  0x0000000001a6fd69 in impala::PlanFragmentExecutor::Prepare 
> (this=0xc9645d0, query_state=0x9382800, desc_tbl=..., fragment_ctx=..., 
> instance_ctx=...) at 
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:99
> #9  0x0000000001a6cce5 in impala::FragmentInstanceState::Exec 
> (this=0xc964300) at 
> /home/abehm/impala/be/src/runtime/fragment-instance-state.cc:64
> #10 0x0000000001a783d1 in impala::QueryExecMgr::ExecFInstance 
> (this=0xb870ba0, fis=0xc964300) at 
> /home/abehm/impala/be/src/runtime/query-exec-mgr.cc:110
> #11 0x0000000001a7b1fa in boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::FragmentInstanceState*>::operator() (this=0xac8ce60, p=0xb870ba0, 
> a1=0xc964300) at 
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:165
> #12 0x0000000001a7b083 in 
> boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, 
> boost::_bi::value<impala::FragmentInstanceState*> 
> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, 
> impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xac8ce70, f=..., 
> a=...) at 
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:313
> {code}
> {code}
> Thread 2 (Thread 0x7fda10b89700 (LWP 8874)):
> #0  0x00007fda9bc7cd84 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000011c1f6d in boost::condition_variable::wait (this=0xc962be0, 
> m=...) at 
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/thread/pthread/condition_variable.hpp:73
> #2  0x000000000133caf7 in impala::Promise<impala::Status>::Get 
> (this=0xc962be0) at /home/abehm/impala/be/src/util/promise.h:67
> #3  0x0000000001a6ff70 in impala::PlanFragmentExecutor::WaitForOpen 
> (this=0xc9629d0) at 
> /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:108
> #4  0x0000000001a38e2f in impala::Coordinator::Wait (this=0xbe72d00) at 
> /home/abehm/impala/be/src/runtime/coordinator.cc:1063
> #5  0x000000000152be3c in impala::ImpalaServer::QueryExecState::WaitInternal 
> (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:666
> #6  0x000000000152b960 in impala::ImpalaServer::QueryExecState::Wait 
> (this=0x972ac00) at /home/abehm/impala/be/src/service/query-exec-state.cc:634
> #7  0x0000000001547643 in boost::_mfi::mf0<void, 
> impala::ImpalaServer::QueryExecState>::operator() (this=0x7fda10b88d78, 
> p=0x972ac00) at 
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:49
> #8  0x0000000001547260 in 
> boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> 
> >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, 
> boost::_bi::list0> (this=0x7fda10b88d88, f=..., a=...) at 
> /home/abehm/impala/toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:253
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to