[
https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenzhe Zhou resolved IMPALA-6788.
---------------------------------
Resolution: Fixed
> Abort ExecFInstance() RPC loop early after query failure
> --------------------------------------------------------
>
> Key: IMPALA-6788
> URL: https://issues.apache.org/jira/browse/IMPALA-6788
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Affects Versions: Impala 2.12.0
> Reporter: Mostafa Mokhtar
> Assignee: Wenzhe Zhou
> Priority: Major
> Labels: krpc, rpc
> Attachments: connect_thread_busy_queries_failing.txt,
> impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then
> once the startup completes the query is cancelled, this is because one of the
> intermediate rpcs failed.
> Not clear what the right answer is as fragments are started asynchronously,
> possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec()
> query_id=334cc7dd9758c36c:ec38aeb400000000 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644
> backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644
> backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel()
> query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends()
> query_id=334cc7dd9758c36c:ec38aeb400000000, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control
> resources for query_id=334cc7dd9758c36c:ec38aeb400000000
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec()
> query_id=e44d553b04d47cfb:28f06bb800000000 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640
> backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640
> backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel()
> query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends()
> query_id=e44d553b04d47cfb:28f06bb800000000, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control
> resources for query_id=e44d553b04d47cfb:28f06bb800000000
> {code}
> Checked the coordinator and threads appear to be spending lots of time
> waiting on exec_complete_barrier_
> {code}
> #0 0x00007fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x0000000001222944 in impala::Promise<bool>::Get() ()
> #2 0x0000000001220d7b in impala::Coordinator::StartBackendExec() ()
> #3 0x0000000001221c87 in impala::Coordinator::Exec() ()
> #4 0x0000000000c3a925 in
> impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest
> const&) ()
> #5 0x0000000000c41f7e in
> impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6 0x0000000000bff597 in
> impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&,
> std::shared_ptr<impala::ImpalaServer::SessionState>, bool*,
> std::shared_ptr<impala::ClientRequestState>*) ()
> #7 0x0000000000c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*,
> std::shared_ptr<impala::ImpalaServer::SessionState>,
> std::shared_ptr<impala::ClientRequestState>*) ()
> #8 0x0000000000c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&,
> beeswax::Query const&) ()
> /StartBackendExec
> #11 0x0000000000d60c9a in boost::detail::thread_data<boost::_bi::bind_t<void,
> void (*)(std::string const&, std::string const&, boost::function<void ()>,
> impala::ThreadDebugInfo const*, impala::Promise<long>*),
> boost::_bi::list5<boost::_bi::value<std::string>,
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)