[
https://issues.apache.org/jira/browse/IMPALA-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-6762.
-----------------------------------
Resolution: Cannot Reproduce
I took another look and agree that it doesn't make sense - there's no way it
should be referencing invalid memory here. So it's probably a heap
use-after-free, which we can't really track down without a repro (in all
likelihood it's been fixed)
> DataStreamRecvr::SenderQueue::GetBatch encounters an exception doing a
> data_arrival_cv_.Wait(l)
> ------------------------------------------------------------------------------------------------
>
> Key: IMPALA-6762
> URL: https://issues.apache.org/jira/browse/IMPALA-6762
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 2.6.0, Impala 2.13.0
> Reporter: Pranay Singh
> Assignee: Pranay Singh
> Priority: Major
> Labels: crash
>
> Problem: In the function impala::DataStreamRecvr::SenderQueue::GetBatch()
> while
> calling data_arrival_cv_.Wait() an exception is encountered in boost
> library, which
> results in a SIGABRT. The probable cause of this issue is that lock
> has been freed.
> Note : This problem has been investigated for legacy thrift setup not in a
> new KuduRPC setup
> Evidence: We have a minidump for the issue seen; the two suspected threads
> involved in the issue are listed below.
> Thread encountered SIGABRT
> Crash reason: SIGABRT
> Crash address: 0x3d300008b2f
> Process uptime: not available
> Thread 959 (crashed)
> 0 libc-2.17.so + 0x351f7
> rax = 0x0000000000000000 rdx = 0x0000000000000006
> rcx = 0xffffffffffffffff rbx = 0x00007f1291116f18
> rsi = 0x000000000001a041 rdi = 0x0000000000008b2f
> rbp = 0x0000000002ad97c0 rsp = 0x00007f102ac0cd48
> r8 = 0x000000000000000a r9 = 0x00007f102ac0e700
> r10 = 0x0000000000000008 r11 = 0x0000000000000202
> r12 = 0x00007f1291116f00 r13 = 0x00007f102ac0cfb0
> r14 = 0x0000000000000000 r15 = 0x0000000000000000
> rip = 0x00007f13ec6601f7
> Found by: given as instruction pointer in context
> 1 libc-2.17.so + 0x368e8
> rsp = 0x00007f102ac0cd50 rip = 0x00007f13ec6618e8
> Found by: stack scanning
> .
> .
> .
> 9 impalad!<name omitted>
> rax = 0x0000000000000001 rdx = 0x0000000000000001
> rbx = 0x00007f102ac0d390 rbp = 0x00007f12c68c13a0
> rsp = 0x00007f102ac0d390 r12 = 0x00007f12cc820cc0
> r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
> r15 = 0x0000000000000001 rip = 0x000000000080fe65
> Found by: call frame info
> 10 impalad!<name omitted>
> rbx = 0x00007f102ac0d4e0 rbp = 0x00007f1244ab5630
> rsp = 0x00007f102ac0d3e0 r12 = 0x00007f12cc820cc0
> r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
> r15 = 0x0000000000000001 rip = 0x000000000080fe8c
> Found by: call frame info
> 11 impalad!<name omitted>
> rbx = 0x0000000000000000 rbp = 0x00007f1244ab5630
> rsp = 0x00007f102ac0d430 r12 = 0x00007f12cc820cc0
> r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
> r15 = 0x0000000000000001 rip = 0x0000000000810294
> Found by: call frame info
> 12 impalad!impala::DataStreamRecvr::(impala::RowBatch**)
> rbx = 0x00007f12cc820c60 rbp = 0x00007f102ac0d500
> rsp = 0x00007f102ac0d4c0 r12 = 0x00007f102ac0d530
> r13 = 0x00007f12cc820c90 r14 = 0x00007f127242f338
> r15 = 0x00007f12cc820d48 rip = 0x0000000000a280f3
> Found by: call frame info
> 13 impalad!impala::DataStreamRecvr::GetBatch(impala::RowBatch**)
> rbx = 0x00007f102ac0d5c0 rbp = 0x00007f102ac0d5c0
> rsp = 0x00007f102ac0d5a0 r12 = 0x00007f121f464100
> r13 = 0x00007f127242f180 r14 = 0x00007f121f464100
> r15 = 0x00007f102ac0d760 rip = 0x0000000000a284c3
> Found by: call frame info
> 14 impalad!impala::ExchangeNode::FillInputRowBatch(impala::RuntimeState*)
> rbx = 0x00007f102ac0d690 rbp = 0x00007f102ac0d5c0
> rsp = 0x00007f102ac0d5b0 r12 = 0x00007f121f464100
> r13 = 0x00007f127242f180 r14 = 0x00007f121f464100
> r15 = 0x00007f102ac0d760 rip = 0x0000000000beffa5
> Found by: call frame info
> 15 impalad!impala::ExchangeNode::Open(impala::RuntimeState*)
> rbx = 0x00007f121f464100 rbp = 0x00007f102ac0d8d0
> rsp = 0x00007f102ac0d640 r12 = 0x00007f127242f180
> r13 = 0x00007f102ac0d690 r14 = 0x00007f121f464100
> r15 = 0x00007f102ac0d760 rip = 0x0000000000bf0d9e
> Found by: call frame info
> Thread 336
> ----------------
> 13 impalad!<name omitted> [TBufferTransports.h : 69 + 0xe]
> rbx = 0x0000000000000000 rbp = 0x0000000000000004
> rsp = 0x00007f13077b9840 r12 = 0x0000000000000004
> r13 = 0x00007f13077b98b0 r14 = 0x00007f12c3f6f270
> r15 = 0x00007f12d5a7c034 rip = 0x000000000080be6e
> Found by: call frame info
> 14
> impalad!apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin(std::string&,
> apache::thrift::protocol::TMessageType&, int&)
> rbx = 0x00007f13077b98b0 rbp = 0x00007f13077b98f8
> rsp = 0x00007f13077b98a0 r12 = 0x00007f13077b98fc
> r13 = 0x00007f13077b9900 r14 = 0x00007f12406cd0e0
> r15 = 0x00007f13077b9b80 rip = 0x00000000009ca5bf
> Found by: call frame info
> 15
> impalad!impala::ImpalaInternalServiceClient::recv_CancelPlanFragment(impala::TCancelPlanFragmentResult&)
> rbx = 0x000000001f9241c0 rbp = 0x00007f13ed2106a0
> rsp = 0x00007f13077b98f0 r12 = 0x00007f13077b9900
> r13 = 0x00007f13077b9b80 r14 = 0x00007f13077b9b50
> r15 = 0x00007f13077b9b80 rip = 0x0000000000cba069
> Found by: call frame info
> 16 impalad!impala::Status
> impala::ClientConnection<impala::ImpalaBackendClient>::DoRpc<void
> (impala::ImpalaInternalServiceClient::*)(impala::TCancelPlanFragmentResult&,
> impala::TCancelPlanFragmentParams const&), impala::TCancelPlanFragmentParams,
> impala::TCancelPlanFragmentResult>(void
> (impala::ImpalaInternalServiceClient::*
> const&)(impala::TCancelPlanFragmentResult&, impala::TCancelPlanFragmentParams
> const&), impala::TCancelPlanFragmentParams const&,
> impala::TCancelPlanFragmentResult*, bool*)
> rbx = 0x00007f13077b9b20 rbp = 0x00007f13077b9ae0
> rsp = 0x00007f13077b9970 r12 = 0x00007f13077b9bc0
> r13 = 0x00007f13077b9acf r14 = 0x00007f13077b9b50
> r15 = 0x00007f13077b9b80 rip = 0x0000000000d79031
> Found by: call frame info
> 17 impalad!impala::Coordinator::CancelRemoteFragments()
> rbx = 0x0000000000000000 rbp = 0x00007f12d8533f40
> rsp = 0x00007f13077b9a60 r12 = 0x00007f12d8533fa0
> r13 = 0x00007f13077b9bc0 r14 = 0x000000003dc58000
> r15 = 0x00007f13077b9b20 rip = 0x0000000000d6818f
> Found by: call frame info
> 18 impalad!impala::Coordinator::CancelInternal()
> rbx = 0x000000003dc58000 rbp = 0x00007f13077b9d70
> rsp = 0x00007f13077b9d70 r12 = 0x00007f127209f600
> r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
> r15 = 0x00007f13077b9de0 rip = 0x0000000000d6f7f2
> Found by: call frame info
> 19 impalad!impala::Coordinator::Cancel(impala::Status const*)
> rbx = 0x000000003dc58000 rbp = 0x000000003dc58390
> rsp = 0x00007f13077b9da0 r12 = 0x00007f13077b9ff0
> r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
> r15 = 0x00007f13077b9de0 rip = 0x0000000000d71b83
> Found by: call frame info
> 20 impalad!impala::ImpalaServer::QueryExecState::Cancel(bool, impala::Status
> const*)
> rbx = 0x00007f12b928e000 rbp = 0x00007f12b928e2b8
> rsp = 0x00007f13077b9dc0 r12 = 0x00007f13077b9e60
> r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
> r15 = 0x00007f13077b9de0 rip = 0x0000000000adba06
> Found by: call frame info
> 21 impalad!impala::ImpalaServer::CancelInternal(impala::TUniqueId const&,
> bool, impala::Status const*)
> rbx = 0x00007f13077b9e70 rbp = 0x00007f13077b9f50
> rsp = 0x00007f13077b9e30 r12 = 0x00007f13077b9e60
> r13 = 0x00007f13ed2106a0 r14 = 0x000000000f8b1100
> r15 = 0x00007f13077b9ff0 rip = 0x0000000000a8597a
> Found by: call frame info
> Cause of the issue
> ------------------------
> DataStreamRecvr::SenderQueue::Cancel() or DataStreamRecvr::CancelStream()
> does not wait for threads inside
> impala::DataStreamRecvr::SenderQueue::GetBatch() to finish, that leads to a
> situation where the ~DataStreamRecv() will be called with thread still in
> impala::DataStreamRecvr::SenderQueue::GetBatch() which may sometime result in
> this crash.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]