[ 
https://issues.apache.org/jira/browse/IMPALA-9737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119757#comment-17119757
 ] 

ASF subversion and git services commented on IMPALA-9737:
---------------------------------------------------------

Commit e0734913becaf9b600ea0919c08855149467d6b5 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e073491 ]

IMPALA-9737: fix reservation accounting bug in PHJ

This bug would only impact spilling queries with
mt_dop > 0, where there is a separate builder and
reservation needs to be transferred back to the builder.

The bug was that 'probe_batch_' could own buffers that
were using reservation from the PartitionedHashJoinNode.
This reservation needs to be transferred back to the
PartitionedHashJoinBuilder via ReturnReservation(). Therefore
we need to clean up the batches before calling ReturnReservation().

Change-Id: Ia1619ad642628b64ed57e6f85e0755a128bafdb5
Reviewed-on: http://gerrit.cloudera.org:8080/15958
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Csaba Ringhofer <[email protected]>


> DCHECK in buffer-pool.cc - min_bytes_to_write <= 
> dirty_unpinned_pages_.bytes() 
> -------------------------------------------------------------------------------
>
>                 Key: IMPALA-9737
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9737
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Tim Armstrong
>            Priority: Blocker
>              Labels: crash
>             Fix For: Impala 4.0
>
>
> Saw this recently in a dockerised pre-commit tests against a seemingly 
> unrelated change: 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/10499/#showFailuresLink
>  (triggered by https://gerrit.cloudera.org/#/c/14666/)
> The error message from the logs is:
> {code}
> Error Message
> DCHECK found in log file: 
> /home/ubuntu/Impala/logs/ee_tests/impalad_node1.FATAL
> Standard Error
> Log file created at: 2020/05/07 18:07:14
> Running on machine: ip-172-31-3-33
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> F0507 18:07:14.606797 88747 buffer-pool.cc:711] 
> 3f4ad52d42fef180:55b3458000000004] Check failed: min_bytes_to_write <= 
> dirty_unpinned_pages_.bytes() (262144 vs. 0) <BufferPool::Client> 0xedf5d680 
> name: HASH_JOIN_NODE id=2 ptr=0x25b0e400 write_status:  buffers allocated 
> 262144 num_pages: 0 pinned_bytes: 0 dirty_unpinned_bytes: 0 
> in_flight_write_bytes: 0 reservation: {<ReservationTracker>: 
> reservation_limit 9223372036854775807 reservation 524288 used_reservation 
> 262144 child_reservations 0 parent:
> <ReservationTracker>: reservation_limit 9223372036854775807 reservation 
> 524288 used_reservation 0 child_reservations 524288 parent:
> <ReservationTracker>: reservation_limit 175112192 reservation 132120576 
> used_reservation 0 child_reservations 132120576 parent:
> <ReservationTracker>: reservation_limit 10952163328 reservation 326664192 
> used_reservation 0 child_reservations 326664192 parent:
> NULL}
>   0 pinned pages: 
>   0 dirty unpinned pages: 
>   0 in flight write pages: 
> {code}
> The minidump stack is:
> {code}
> Operating system: Linux
>                   0.0.0 Linux 4.4.0-1081-aws #91-Ubuntu SMP Tue Apr 16 
> 08:21:03 UTC 2019 x86_64
> CPU: amd64
>      family 6 model 79 stepping 1
>      16 CPUs
> GPU: UNKNOWN
> Crash reason:  SIGABRT
> Crash address: 0x3e8000010e6
> Process uptime: not available
> Thread 418 (crashed)
>  0  libc-2.23.so + 0x35428
>     rax = 0x0000000000000000   rdx = 0x0000000000000006
>     rcx = 0x00007f948e3aa428   rbx = 0x00000000073e2300
>     rsi = 0x0000000000015aab   rdi = 0x00000000000010e6
>     rbp = 0x00007f9394558c60   rsp = 0x00007f93945588f8
>      r8 = 0x0000000000000000    r9 = 0x0000000000000020
>     r10 = 0x0000000000000008   r11 = 0x0000000000000202
>     r12 = 0x00000000073e2380   r13 = 0x000000000000039a
>     r14 = 0x00000000073e9cc4   r15 = 0x00000000073e2300
>     rip = 0x00007f948e3aa428
>     Found by: given as instruction pointer in context
>  1  libc-2.23.so + 0x3702a
>     rbp = 0x00007f9394558c60   rsp = 0x00007f9394558900
>     rip = 0x00007f948e3ac02a
>     Found by: stack scanning
>  2  impalad!google::DumpStackTraceAndExit() + 0x24
>     rbp = 0x00007f9394558c60   rsp = 0x00007f9394558a30
>     rip = 0x0000000005010014
>     Found by: stack scanning
>  3  impalad!google::LogMessage::Fail() + 0xd
>     rbx = 0x00000000073e2300   rbp = 0x00007f9394558c60
>     rsp = 0x00007f9394558ae0   rip = 0x0000000005006a6d
>     Found by: call frame info
>  4  impalad!google::LogMessage::SendToLog() + 0x2b2
>     rbx = 0x00000000073e2300   rbp = 0x00007f9394558c60
>     rsp = 0x00007f9394558af0   rip = 0x0000000005008312
>     Found by: call frame info
>  5  impalad!google::LogMessage::Flush() + 0x157
>     rbx = 0x00007f9394558ca0   rbp = 0x00007f948ef675a0
>     rsp = 0x00007f9394558c70   r12 = 0x00007f9394558c8f
>     r13 = 0x0000000000000001   r14 = 0x00007f9394558db0
>     r15 = 0x0000000000000001   rip = 0x0000000005006447
>     Found by: call frame info
>  6  impalad!google::LogMessageFatal::~LogMessageFatal() + 0xe
>     rbx = 0x00007f9394558db0   rbp = 0x00007f9394558f40
>     rsp = 0x00007f9394558cf0   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x0000000005009a0e
>     Found by: call frame info
>  7  impalad!impala::BufferPool::Client::WriteDirtyPagesAsync(long) 
> [buffer-pool.cc : 711 + 0xf]
>     rbx = 0x0000000000000000   rbp = 0x00007f9394558f40
>     rsp = 0x00007f9394558d10   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x0000000002701732
>     Found by: call frame info
>  8  
> impalad!impala::BufferPool::Client::CleanPages(std::unique_lock<std::mutex>*, 
> long, bool) [buffer-pool.cc : 691 + 0x16]
>     rbx = 0x0000000000000000   rbp = 0x00007f9394559170
>     rsp = 0x00007f9394558f50   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x0000000002701147
>     Found by: call frame info
>  9  
> impalad!impala::BufferPool::Client::TransferReservationTo(impala::ReservationTracker*,
>  long, bool*) [buffer-pool.cc : 648 + 0x1e]
>     rbx = 0x0000000000000000   rbp = 0x00007f9394559200
>     rsp = 0x00007f9394559180   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x0000000002700a21
>     Found by: call frame info
> 10  
> impalad!impala::BufferPool::ClientHandle::TransferReservationTo(impala::ReservationTracker*,
>  long, bool*) [buffer-pool.cc : 347 + 0x22]
>     rbx = 0x0000000000000000   rbp = 0x00007f9394559240
>     rsp = 0x00007f9394559210   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x00000000026fce68
>     Found by: call frame info
> 11  
> impalad!impala::BufferPool::ClientHandle::TransferReservationTo(impala::BufferPool::ClientHandle*,
>  long, bool*) [buffer-pool.cc : 353 + 0x33]
>     rbx = 0x0000000000000000   rbp = 0x00007f93945592c0
>     rsp = 0x00007f9394559250   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x00000000026fcf55
>     Found by: call frame info
> 12  
> impalad!impala::PhjBuilder::ReturnReservation(impala::BufferPool::ClientHandle*,
>  long) [partitioned-hash-join-builder.cc : 1155 + 0x35]
>     rbx = 0x0000000000000000   rbp = 0x00007f93945593b0
>     rsp = 0x00007f93945592d0   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x0000000002860a8f
>     Found by: call frame info
> 13  impalad!impala::PartitionedHashJoinNode::Close(impala::RuntimeState*) 
> [partitioned-hash-join-node.cc : 305 + 0x54]
>     rbx = 0x0000000000080000   rbp = 0x00007f93945593f0
>     rsp = 0x00007f93945593c0   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x00000000028726fc
>     Found by: call frame info
> 14  impalad!impala::ExecNode::Close(impala::RuntimeState*) [exec-node.cc : 
> 314 + 0x37]
>     rbx = 0x0000000000000000   rbp = 0x00007f93945594e0
>     rsp = 0x00007f9394559400   r12 = 0x0000000000000000
>     r13 = 0x0000000000000001   r14 = 0x0000000000000001
>     r15 = 0x0000000000000001   rip = 0x000000000277437c
>     Found by: call frame info
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to