[
https://issues.apache.org/jira/browse/IMPALA-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825675#comment-17825675
]
Evgeniy commented on IMPALA-12439:
----------------------------------
It seems that Impala daemon hangs on RPC. We have seen a lot of rpc method
"CancelQueryFInstances" for inbound connections. Something like below:
{
"remote_ip": "ip.ip.ip.ip:44876",
"num_calls_in_flight": 58,
""socket_status": {
"rtt": 2442,
"rttvar": 4604,
"snd_cwnd": 10,
"total_retrans": 0,
"pacing_rate": 11957411,
"max_pacing_rate": 18446744073709551615,
"bytes_acked": 12566319,
"bytes_received": 10993660785,
"segs_out": 901401,
"segs_in": 7990372,
"send_queue_bytes": 0,
"receive_queue_bytes": 0,
"send_bytes_per_sec": 5978705
},
"calls_in_flight": [
{
"header": {
"call_id": 169977,
"remote_method": {
"service_name": "impala.Control Service",
"method_name": "CancelQueryFInstances"
},
"timeout_millis": 10000
},
"micros_elapsed": 3175972030
},
{
"header": {
"call_id": 169975,
"remote_method": {
"service_name": "impala.Control Service",
"method_name": "CancelQueryFInstances"
},
"timeout_millis": 10000
},
"micros_elapsed": 3185965057
},....
> Impala Daemon stucks on random executors
> ----------------------------------------
>
> Key: IMPALA-12439
> URL: https://issues.apache.org/jira/browse/IMPALA-12439
> Project: IMPALA
> Issue Type: Bug
> Components: Distributed Exec
> Affects Versions: Impala 3.4.0
> Reporter: Evgeniy
> Priority: Critical
> Attachments: resolved_420a96bf.txt, resolved_d7750c55.txt
>
>
> Hi!
> In our cluster we face the next problem periodically:
> 1. The query fails with the error like this "Exec() rpc failed: Timed out:
> ExecQueryFInstances RPC to <node_ip>:27000 timed out after 300.000s". Every
> time when the problem appears the problem node may be different.
> 2. We have analyzed minidumps of the impala daemon from two different cases
> (there are resolving minidumps in attachment). It seems that impala daemon
> stuck on cancelation query fragment:
> Thread 244
> 0 libpthread-2.17.so + 0xba35
> rax = 0xfffffffffffffe00 rdx = 0x0000000000000002
> rcx = 0xffffffffffffffff rbx = 0x000000007cd81b10
> rsi = 0x0000000000000080 rdi = 0x000000007cd81b14
> rbp = 0x00007f7ba5ae8580 rsp = 0x00007f7ba5ae8520
> r8 = 0x000000007cd81b00 r9 = 0x0000000000000000
> r10 = 0x0000000000000000 r11 = 0x0000000000000246
> r12 = 0x00000000eafe6400 r13 = 0x00007f7ba5ae85c0
> r14 = 0x00007f845b7287d0 r15 = 0x00007f7ba5ae8660
> rip = 0x00007f845b727a35
> Found by: given as instruction pointer in context
> 1 impalad!impala::QueryState::Cancel() + 0xdb
> rbp = 0x00007f7ba5ae8600 rsp = 0x00007f7ba5ae8590
> rip = 0x00000000011791bb
> Found by: previous frame's frame pointer
> 2
> impalad!impala::ControlService::CancelQueryFInstances(impala::CancelQueryFInstancesRequestPB
> const*, impala::CancelQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) +
> 0x177
> rbx = 0x00007f8458e136a0 rbp = 0x00007f7ba5ae8780
> rsp = 0x00007f7ba5ae8610 r12 = 0x00007f7ba5ae8720
> r13 = 0x00007f7ba5ae86a0 rip = 0x0000000001218f77
> Found by: call frame info
> 3 impalad!kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) +
> 0x17c
> rbx = 0x0000000015e4e460 rbp = 0x00007f7ba5ae87e0
> rsp = 0x00007f7ba5ae8790 r12 = 0x00000007a6bf8ee0
> r13 = 0x0000000014f86740 r14 = 0x0000000014f86f00
> r15 = 0x0000000014f87480 rip = 0x0000000001788ffc
> Found by: call frame info
> 4 impalad!impala::ImpalaServicePool::RunThread() + 0x1be
> rbx = 0x00007f840000000d rbp = 0x00007f7ba5ae88a0
> rsp = 0x00007f7ba5ae87f0 r12 = 0x0000000018b30f80
> r13 = 0x0000000000000000 r14 = 0x0000000000000051
> r15 = 0x00007f840000000d rip = 0x00000000010dbdee
> Found by: call frame info
> 5 impalad!impala::Thread::SuperviseThread(std::string const&, std::string
> const&, boost::function<void ()>, impala::ThreadDebugInfo const*,
> impala::Promise<long, (impala::PromiseMode)0>*) + 0x30b
> rbx = 0x00007f7ba5ae8970 rbp = 0x00007f7ba5ae8be0
> rsp = 0x00007f7ba5ae88b0 r12 = 0x00007ffed2cdb298
> r13 = 0x000000000592ee20 r14 = 0x00007f7ba5ae8910
> r15 = 0x00007f8458e136a0 rip = 0x0000000001435f8b
> Found by: call frame info
> 6 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void
> (*)(std::string const&, std::string const&, boost::function<void ()>,
> impala::ThreadDebugInfo const*, impala::Promise<long,
> (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::string>,
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >,
> boost::_bi::value<impala::ThreadDebugInfo*>,
> boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >
> >::run() + 0x7a
> rbx = 0x0000000015e34e00 rbp = 0x00007f7ba5ae8c40
> rsp = 0x00007f7ba5ae8bf0 r12 = 0x00007f7ba5ae8c00
> r13 = 0x0000000001435c80 r14 = 0x0000000000000000
> r15 = 0x00007f7ba5ae9700 rip = 0x0000000001436e5a
> Found by: call frame info
> 7 impalad!thread_proxy + 0xea
> rbx = 0x0000000015e34e00 rbp = 0x0000000000000000
> rsp = 0x00007f7ba5ae8c50 r12 = 0x00007f7ba5ae8c50
> r13 = 0x0000000000801000 r14 = 0x0000000000000000
> r15 = 0x00007f7ba5ae9700 rip = 0x0000000001c18e1a
> Found by: call frame info
> 8 libpthread-2.17.so + 0x7ea5
> rbx = 0x0000000000000000 rbp = 0x0000000000000000
> rsp = 0x00007f7ba5ae8ca0 r12 = 0x0000000000000000
> r13 = 0x0000000000801000 r14 = 0x0000000000000000
> r15 = 0x00007f7ba5ae9700 rip = 0x00007f845b723ea5
> Found by: call frame info
> 9 libc-2.17.so + 0xfeb0d
> rsp = 0x00007f7ba5ae8d40 rip = 0x00007f8458321b0d
> Found by: stack scanning
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]