[ 
https://issues.apache.org/jira/browse/IMPALA-9349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027950#comment-17027950
 ] 

ASF subversion and git services commented on IMPALA-9349:
---------------------------------------------------------

Commit 7b280e5841bf247b3866c16ea26f04c9e2dd3a61 in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7b280e5 ]

IMPALA-9349: free output_unmatched_batch_ buffers promptly in PHJ

This fixes a subtle memory managment issue where freeing of a
buffer is delayed longer than it should be. This means that
the full buffer pool reservation is not available for
repartitioning, which can lead to crashes or hang for
very specific queries.

The fix is to transfer resources from output_unmatched_batch_
as soon as the last row from the batch is appended to the
output batch.

This bug would only be triggered by join modes that output
unmatched rows from the right side (RIGHT OUTER JOIN,
FULL OUTER JOIN, RIGHT ANTI JOIN) *and* have an empty
probe side (otherwise unmatched rows are output by
iterating over the hash table).

Testing:
Added DCHECKs to check that all resources are available
before repartitioning.

Added a regression test that triggered the bug.

Change-Id: Ie13b51d4d909afb0fe2e7b7dc00b085c51058fed
Reviewed-on: http://gerrit.cloudera.org:8080/15142
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> output_unmatched_batch_ holds onto buffers for too long 
> --------------------------------------------------------
>
>                 Key: IMPALA-9349
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9349
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: crash, hang
>
> IMPALA-4224 made some of the reservation management in PHJ more explicit, 
> which revealed a minor bug. This query from TestSpilling triggers the bug, 
> but it has no symptoms currently because there is at least a surplus 256k of 
> reservation set aside for max_row_size.
> {noformat}
> # spilled partition with 0 probe rows, RIGHT OUTER JOIN
> set debug_action="-1:OPEN:[email protected]";
> select straight_join count(*)
> from 
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> {noformat}
> However, a slight tweak triggers a DCHECK
> {noformat}
> [localhost:21000] tpch_parquet> use tpch_parquet; set 
> default_spillable_buffer_size=256k; set max_row_size=256k; set 
> debug_action=-1:OPEN:[email protected];select 
> straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10;
> Query: use tpch_parquet
> DEFAULT_SPILLABLE_BUFFER_SIZE set to 256k
> MAX_ROW_SIZE set to 256k
> DEBUG_ACTION set to -1:OPEN:[email protected]
> Query: select straight_join count(*)
> from
> supplier right outer join lineitem on s_suppkey = l_suppkey
> where s_acctbal > 0 and s_acctbal < 10
> Query submitted at: 2020-01-30 23:12:11 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=8e445d7018e08002:6e35218800000000
> ERROR: Failed due to unreachable impalad(s): tarmstrong-box:22002
> {noformat}
> F0130 23:12:14.652458  2727 partitioned-hash-join-builder.cc:364] 
> 8e445d7018e08002:6e35218800000005] Check failed: got_buffer Accounted in min 
> reservation<BufferPool::Client> 0xb96d870 internal state: 
> {<BufferPool::Client> 0xf8843a0 name: HASH_JOIN_NODE id=2 ptr=0xb96d700 
> write_status:  buffers allocated 262144 num_pages: 166 pinned_bytes: 262144 
> dirty_unpinned_bytes: 786432 in_flight_write_bytes: 524288 reservation: 
> {<ReservationTracker>: reservation_limit 9223372036854775807 reservation 
> 4456448 used_reservation 524288 child_reservations 3932160 parent:
> <ReservationTracker>: reservation_limit 9223372036854775807 reservation 
> 4456448 used_reservation 0 child_reservations 4456448 parent:
> <ReservationTracker>: reservation_limit 6279187114 reservation 8650752 
> used_reservation 0 child_reservations 8650752 parent:
> <ReservationTracker>: reservation_limit 6671630336 reservation 8667136 
> used_reservation 0 child_reservations 8667136 parent:
> NULL}
>   1 pinned pages: <BufferPool::Page> 0x12ed9ea0 len: 262144 pin_count: 1 buf: 
> <BufferPool::BufferHandle> 0x12ed9f18 client: 0xb96d870/0xf8843a0 data: 
> 0x17380000 len: 262144
>   3 dirty unpinned pages: <BufferPool::Page> 0x1319c500 len: 262144 
> pin_count: 0 buf: <BufferPool::BufferHandle> 0x1319c578 client: 
> 0xb96d870/0xf8843a0 data: 0x1554a000 len: 262144
> <BufferPool::Page> 0x1319dae0 len: 262144 pin_count: 0 buf: 
> <BufferPool::BufferHandle> 0x1319db58 client: 0xb96d870/0xf8843a0 data: 
> 0x127fc000 len: 262144
> <BufferPool::Page> 0x1319e4e0 len: 262144 pin_count: 0 buf: 
> <BufferPool::BufferHandle> 0x1319e558 client: 0xb96d870/0xf8843a0 data: 
> 0x16f4a000 len: 262144
>   2 in flight write pages: <BufferPool::Page> 0x12ebe1c0 len: 262144 
> pin_count: 0 buf: <BufferPool::BufferHandle> 0x12ebe238 client: 
> 0xb96d870/0xf8843a0 data: 0x16740000 len: 262144
> <BufferPool::Page> 0x1319d4a0 len: 262144 pin_count: 0 buf: 
> <BufferPool::BufferHandle> 0x1319d518 client: 0xb96d870/0xf8843a0 data: 
> 0x16340000 len: 262144
> }
> *** Check failure stack trace: ***
>     @          0x4dfceac  google::LogMessage::Fail()
>     @          0x4dfe751  google::LogMessage::SendToLog()
>     @          0x4dfc886  google::LogMessage::Flush()
>     @          0x4dffe4d  google::LogMessageFatal::~LogMessageFatal()
>     @          0x2753df6  impala::PhjBuilder::CreateAndPreparePartition()
>     @          0x2754036  impala::PhjBuilder::CreateHashPartitions()
>     @          0x2758209  impala::PhjBuilder::RepartitionBuildInput()
>     @          0x2757953  impala::PhjBuilder::BeginSpilledProbe()
>     @          0x2688643  impala::PartitionedHashJoinNode::BeginSpilledProbe()
>     @          0x268aff5  impala::PartitionedHashJoinNode::GetNext()
>     @          0x276b436  impala::AggregationNode::Open()
>     @          0x2159d7f  impala::FragmentInstanceState::Open()
>     @          0x2156937  impala::FragmentInstanceState::Exec()
>     @          0x216aa7e  impala::QueryState::ExecFInstance()
>     @          0x2168d9d  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x216c666  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x1f60b7b  boost::function0<>::operator()()
>     @          0x250f288  impala::Thread::SuperviseThread()
>     @          0x251760c  boost::_bi::list5<>::operator()<>()
>     @          0x2517530  boost::_bi::bind_t<>::operator()()
>     @          0x25174f3  boost::detail::thread_data<>::run()
>     @          0x3d26099  thread_proxy
>     @     0x7fdab63cf6b9  start_thread
>     @     0x7fdab2b8b41c  clone
> {noformat}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to