Csaba Ringhofer created IMPALA-13955:
----------------------------------------

             Summary: TopN node doesn't clean up pushed out varlen memory
                 Key: IMPALA-13955
                 URL: https://issues.apache.org/jira/browse/IMPALA-13955
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Csaba Ringhofer


{code}
set num_nodes=1; set mt_dop=1;
select l_comment from tpch_parquet.lineitem order by l_orderkey desc limit 2
summary;
Operator       #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail                
-----------------------------------------------------------------------------------------------------------------------
F00:ROOT            1      1  230.194us  230.194us                       4.01 
MB        4.00 MB                        
01:TOP-N            1      1  124.464ms  124.464ms      2           2    1.69 
MB       181.00 B                        
00:SCAN HDFS        1      1   32.446ms   32.446ms  6.00M       6.00M  126.96 
MB      320.00 MB  tpch_parquet.lineitem
{code}

The issue is with TOP-N - 1.69MB is too much for a 2 sized heap and the 
estimations also reflect this.
The issues is that when an element is smaller than the largest in the heap so 
the largest is removed from the heap the varlen slots of the old largest 
element are not released:
https://github.com/apache/impala/blob/0ed4e869de43532c72dab514850944c3a6036bd1/be/src/exec/topn-node-ir.cc#L66
deepcopy() is called to copy the new tuple to the old tuple, so the the old 
fixed len tuple is reused, but its strings and other var len data are not 
released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to