Csaba Ringhofer created IMPALA-13955:
----------------------------------------
Summary: TopN node doesn't clean up pushed out varlen memory
Key: IMPALA-13955
URL: https://issues.apache.org/jira/browse/IMPALA-13955
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Csaba Ringhofer
{code}
set num_nodes=1; set mt_dop=1;
select l_comment from tpch_parquet.lineitem order by l_orderkey desc limit 2
summary;
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak
Mem Est. Peak Mem Detail
-----------------------------------------------------------------------------------------------------------------------
F00:ROOT 1 1 230.194us 230.194us 4.01
MB 4.00 MB
01:TOP-N 1 1 124.464ms 124.464ms 2 2 1.69
MB 181.00 B
00:SCAN HDFS 1 1 32.446ms 32.446ms 6.00M 6.00M 126.96
MB 320.00 MB tpch_parquet.lineitem
{code}
The issue is with TOP-N - 1.69MB is too much for a 2 sized heap and the
estimations also reflect this.
The issues is that when an element is smaller than the largest in the heap so
the largest is removed from the heap the varlen slots of the old largest
element are not released:
https://github.com/apache/impala/blob/0ed4e869de43532c72dab514850944c3a6036bd1/be/src/exec/topn-node-ir.cc#L66
deepcopy() is called to copy the new tuple to the old tuple, so the the old
fixed len tuple is reused, but its strings and other var len data are not
released.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)