[ 
https://issues.apache.org/jira/browse/IMPALA-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6762.
-----------------------------------
    Resolution: Cannot Reproduce

I took another look and agree that it doesn't make sense - there's no way it 
should be referencing invalid memory here. So it's probably a heap 
use-after-free, which we can't really track down without a repro (in all 
likelihood it's been fixed)

>  DataStreamRecvr::SenderQueue::GetBatch encounters an exception doing a 
> data_arrival_cv_.Wait(l)
> ------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6762
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6762
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 2.6.0, Impala 2.13.0
>            Reporter: Pranay Singh
>            Assignee: Pranay Singh
>            Priority: Major
>              Labels: crash
>
> Problem: In the function impala::DataStreamRecvr::SenderQueue::GetBatch() 
> while
>          calling data_arrival_cv_.Wait() an exception is encountered in boost 
> library, which
>          results in a SIGABRT. The probable cause of this issue is that lock 
> has been freed.
> Note : This problem has been investigated for legacy thrift setup not in a 
> new KuduRPC setup
> Evidence: We have a minidump for the issue seen; the two suspected threads 
> involved in the issue are listed below.
> Thread encountered SIGABRT
> Crash reason:  SIGABRT
> Crash address: 0x3d300008b2f
> Process uptime: not available
> Thread 959 (crashed)
>  0  libc-2.17.so + 0x351f7
>     rax = 0x0000000000000000   rdx = 0x0000000000000006
>     rcx = 0xffffffffffffffff   rbx = 0x00007f1291116f18
>     rsi = 0x000000000001a041   rdi = 0x0000000000008b2f
>     rbp = 0x0000000002ad97c0   rsp = 0x00007f102ac0cd48
>      r8 = 0x000000000000000a    r9 = 0x00007f102ac0e700
>     r10 = 0x0000000000000008   r11 = 0x0000000000000202
>     r12 = 0x00007f1291116f00   r13 = 0x00007f102ac0cfb0
>     r14 = 0x0000000000000000   r15 = 0x0000000000000000
>     rip = 0x00007f13ec6601f7
>     Found by: given as instruction pointer in context
>  1  libc-2.17.so + 0x368e8
>     rsp = 0x00007f102ac0cd50   rip = 0x00007f13ec6618e8
>     Found by: stack scanning
>      .
>      .
>      .
>   9  impalad!<name omitted>
>     rax = 0x0000000000000001   rdx = 0x0000000000000001
>     rbx = 0x00007f102ac0d390   rbp = 0x00007f12c68c13a0
>     rsp = 0x00007f102ac0d390   r12 = 0x00007f12cc820cc0
>     r13 = 0x00007f1244ab5600   r14 = 0x00007f102ac0d4e0
>     r15 = 0x0000000000000001   rip = 0x000000000080fe65
>     Found by: call frame info
> 10  impalad!<name omitted>
>     rbx = 0x00007f102ac0d4e0   rbp = 0x00007f1244ab5630
>     rsp = 0x00007f102ac0d3e0   r12 = 0x00007f12cc820cc0
>     r13 = 0x00007f1244ab5600   r14 = 0x00007f102ac0d4e0
>     r15 = 0x0000000000000001   rip = 0x000000000080fe8c
>     Found by: call frame info
> 11  impalad!<name omitted>
>     rbx = 0x0000000000000000   rbp = 0x00007f1244ab5630
>     rsp = 0x00007f102ac0d430   r12 = 0x00007f12cc820cc0
>     r13 = 0x00007f1244ab5600   r14 = 0x00007f102ac0d4e0
>     r15 = 0x0000000000000001   rip = 0x0000000000810294
>     Found by: call frame info
> 12  impalad!impala::DataStreamRecvr::(impala::RowBatch**)
>     rbx = 0x00007f12cc820c60   rbp = 0x00007f102ac0d500
>     rsp = 0x00007f102ac0d4c0   r12 = 0x00007f102ac0d530
>     r13 = 0x00007f12cc820c90   r14 = 0x00007f127242f338
>     r15 = 0x00007f12cc820d48   rip = 0x0000000000a280f3
>     Found by: call frame info
> 13  impalad!impala::DataStreamRecvr::GetBatch(impala::RowBatch**)
>     rbx = 0x00007f102ac0d5c0   rbp = 0x00007f102ac0d5c0
>     rsp = 0x00007f102ac0d5a0   r12 = 0x00007f121f464100
>     r13 = 0x00007f127242f180   r14 = 0x00007f121f464100
>     r15 = 0x00007f102ac0d760   rip = 0x0000000000a284c3
>     Found by: call frame info
> 14  impalad!impala::ExchangeNode::FillInputRowBatch(impala::RuntimeState*)
>     rbx = 0x00007f102ac0d690   rbp = 0x00007f102ac0d5c0
>     rsp = 0x00007f102ac0d5b0   r12 = 0x00007f121f464100
>     r13 = 0x00007f127242f180   r14 = 0x00007f121f464100
>     r15 = 0x00007f102ac0d760   rip = 0x0000000000beffa5
>     Found by: call frame info
> 15  impalad!impala::ExchangeNode::Open(impala::RuntimeState*)
>     rbx = 0x00007f121f464100   rbp = 0x00007f102ac0d8d0
>     rsp = 0x00007f102ac0d640   r12 = 0x00007f127242f180
>     r13 = 0x00007f102ac0d690   r14 = 0x00007f121f464100
>     r15 = 0x00007f102ac0d760   rip = 0x0000000000bf0d9e
>     Found by: call frame info
> Thread 336
> ----------------
> 13  impalad!<name omitted> [TBufferTransports.h : 69 + 0xe]
>     rbx = 0x0000000000000000   rbp = 0x0000000000000004
>     rsp = 0x00007f13077b9840   r12 = 0x0000000000000004
>     r13 = 0x00007f13077b98b0   r14 = 0x00007f12c3f6f270
>     r15 = 0x00007f12d5a7c034   rip = 0x000000000080be6e
>     Found by: call frame info
> 14  
> impalad!apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin(std::string&,
>  apache::thrift::protocol::TMessageType&, int&)
>     rbx = 0x00007f13077b98b0   rbp = 0x00007f13077b98f8
>     rsp = 0x00007f13077b98a0   r12 = 0x00007f13077b98fc
>     r13 = 0x00007f13077b9900   r14 = 0x00007f12406cd0e0
>     r15 = 0x00007f13077b9b80   rip = 0x00000000009ca5bf
>     Found by: call frame info
> 15  
> impalad!impala::ImpalaInternalServiceClient::recv_CancelPlanFragment(impala::TCancelPlanFragmentResult&)
>     rbx = 0x000000001f9241c0   rbp = 0x00007f13ed2106a0
>     rsp = 0x00007f13077b98f0   r12 = 0x00007f13077b9900
>     r13 = 0x00007f13077b9b80   r14 = 0x00007f13077b9b50
>     r15 = 0x00007f13077b9b80   rip = 0x0000000000cba069
>     Found by: call frame info
> 16  impalad!impala::Status 
> impala::ClientConnection<impala::ImpalaBackendClient>::DoRpc<void 
> (impala::ImpalaInternalServiceClient::*)(impala::TCancelPlanFragmentResult&, 
> impala::TCancelPlanFragmentParams const&), impala::TCancelPlanFragmentParams, 
> impala::TCancelPlanFragmentResult>(void 
> (impala::ImpalaInternalServiceClient::* 
> const&)(impala::TCancelPlanFragmentResult&, impala::TCancelPlanFragmentParams 
> const&), impala::TCancelPlanFragmentParams const&, 
> impala::TCancelPlanFragmentResult*, bool*) 
>     rbx = 0x00007f13077b9b20   rbp = 0x00007f13077b9ae0
>     rsp = 0x00007f13077b9970   r12 = 0x00007f13077b9bc0
>     r13 = 0x00007f13077b9acf   r14 = 0x00007f13077b9b50
>     r15 = 0x00007f13077b9b80   rip = 0x0000000000d79031
>     Found by: call frame info
> 17  impalad!impala::Coordinator::CancelRemoteFragments() 
>     rbx = 0x0000000000000000   rbp = 0x00007f12d8533f40
>     rsp = 0x00007f13077b9a60   r12 = 0x00007f12d8533fa0
>     r13 = 0x00007f13077b9bc0   r14 = 0x000000003dc58000
>     r15 = 0x00007f13077b9b20   rip = 0x0000000000d6818f
>     Found by: call frame info
> 18  impalad!impala::Coordinator::CancelInternal()
>     rbx = 0x000000003dc58000   rbp = 0x00007f13077b9d70
>     rsp = 0x00007f13077b9d70   r12 = 0x00007f127209f600
>     r13 = 0x00007f13077b9ff0   r14 = 0x000000003dc58000
>     r15 = 0x00007f13077b9de0   rip = 0x0000000000d6f7f2
>     Found by: call frame info
> 19  impalad!impala::Coordinator::Cancel(impala::Status const*)
>     rbx = 0x000000003dc58000   rbp = 0x000000003dc58390
>     rsp = 0x00007f13077b9da0   r12 = 0x00007f13077b9ff0
>     r13 = 0x00007f13077b9ff0   r14 = 0x000000003dc58000
>     r15 = 0x00007f13077b9de0   rip = 0x0000000000d71b83
>     Found by: call frame info
> 20  impalad!impala::ImpalaServer::QueryExecState::Cancel(bool, impala::Status 
> const*)
>     rbx = 0x00007f12b928e000   rbp = 0x00007f12b928e2b8
>     rsp = 0x00007f13077b9dc0   r12 = 0x00007f13077b9e60
>     r13 = 0x00007f13077b9ff0   r14 = 0x000000003dc58000
>     r15 = 0x00007f13077b9de0   rip = 0x0000000000adba06
>     Found by: call frame info
> 21  impalad!impala::ImpalaServer::CancelInternal(impala::TUniqueId const&, 
> bool, impala::Status const*) 
>     rbx = 0x00007f13077b9e70   rbp = 0x00007f13077b9f50
>     rsp = 0x00007f13077b9e30   r12 = 0x00007f13077b9e60
>     r13 = 0x00007f13ed2106a0   r14 = 0x000000000f8b1100
>     r15 = 0x00007f13077b9ff0   rip = 0x0000000000a8597a
>     Found by: call frame info
> Cause of the issue
> ------------------------
> DataStreamRecvr::SenderQueue::Cancel() or DataStreamRecvr::CancelStream() 
> does not wait for threads inside 
> impala::DataStreamRecvr::SenderQueue::GetBatch() to finish,  that leads to a 
> situation where the ~DataStreamRecv() will be called with thread still in  
> impala::DataStreamRecvr::SenderQueue::GetBatch() which may sometime result in 
> this crash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to