[jira] [Updated] (IMPALA-13955) TopN node doesn't clean up pushed out varlen memory

Csaba Ringhofer (Jira) Fri, 11 Apr 2025 01:44:54 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-13955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Csaba Ringhofer updated IMPALA-13955:
-------------------------------------
    Description: 
{code}
set num_nodes=1; set mt_dop=1;
select l_comment from tpch_parquet.lineitem order by l_orderkey desc limit 2
summary;
Operator       #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail                
-----------------------------------------------------------------------------------------------------------------------
F00:ROOT            1      1  230.194us  230.194us                       4.01 
MB        4.00 MB                        
01:TOP-N            1      1  124.464ms  124.464ms      2           2    1.69 
MB       181.00 B                        
00:SCAN HDFS        1      1   32.446ms   32.446ms  6.00M       6.00M  126.96 
MB      320.00 MB  tpch_parquet.lineitem
{code}

The issue is with TOP-N - 1.69MB is too much for a 2 sized heap and the 
estimations also reflect this.
The issues is that when an element is smaller than the largest in the heap so 
the largest is removed from the heap the varlen slots of the old largest 
element are not released:
https://github.com/apache/impala/blob/0ed4e869de43532c72dab514850944c3a6036bd1/be/src/exec/topn-node-ir.cc#L66
deepcopy() is called to copy the new tuple to the old tuple, so the the old 
fixed len tuple is reused, but its strings and other var len data are not 
released.

This should be solved by  TopNNode::ReclaimTuplePool(), and tuple reclamations 
did happen during the query:
         - TuplePoolReclamations: 643 (643)
But it is still strange that >1MB memory is needed and that the planner 
underestimates the mem needs so much.

  was:
{code}
set num_nodes=1; set mt_dop=1;
select l_comment from tpch_parquet.lineitem order by l_orderkey desc limit 2
summary;
Operator       #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail                
-----------------------------------------------------------------------------------------------------------------------
F00:ROOT            1      1  230.194us  230.194us                       4.01 
MB        4.00 MB                        
01:TOP-N            1      1  124.464ms  124.464ms      2           2    1.69 
MB       181.00 B                        
00:SCAN HDFS        1      1   32.446ms   32.446ms  6.00M       6.00M  126.96 
MB      320.00 MB  tpch_parquet.lineitem
{code}

The issue is with TOP-N - 1.69MB is too much for a 2 sized heap and the 
estimations also reflect this.
The issues is that when an element is smaller than the largest in the heap so 
the largest is removed from the heap the varlen slots of the old largest 
element are not released:
https://github.com/apache/impala/blob/0ed4e869de43532c72dab514850944c3a6036bd1/be/src/exec/topn-node-ir.cc#L66
deepcopy() is called to copy the new tuple to the old tuple, so the the old 
fixed len tuple is reused, but its strings and other var len data are not 
released.


> TopN node doesn't clean up pushed out varlen memory
> ---------------------------------------------------
>
>                 Key: IMPALA-13955
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13955
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Csaba Ringhofer
>            Priority: Critical
>
> {code}
> set num_nodes=1; set mt_dop=1;
> select l_comment from tpch_parquet.lineitem order by l_orderkey desc limit 2
> summary;
> Operator       #Hosts  #Inst   Avg Time   Max Time  #Rows  Est. #Rows   Peak 
> Mem  Est. Peak Mem  Detail                
> -----------------------------------------------------------------------------------------------------------------------
> F00:ROOT            1      1  230.194us  230.194us                       4.01 
> MB        4.00 MB                        
> 01:TOP-N            1      1  124.464ms  124.464ms      2           2    1.69 
> MB       181.00 B                        
> 00:SCAN HDFS        1      1   32.446ms   32.446ms  6.00M       6.00M  126.96 
> MB      320.00 MB  tpch_parquet.lineitem
> {code}
> The issue is with TOP-N - 1.69MB is too much for a 2 sized heap and the 
> estimations also reflect this.
> The issues is that when an element is smaller than the largest in the heap so 
> the largest is removed from the heap the varlen slots of the old largest 
> element are not released:
> https://github.com/apache/impala/blob/0ed4e869de43532c72dab514850944c3a6036bd1/be/src/exec/topn-node-ir.cc#L66
> deepcopy() is called to copy the new tuple to the old tuple, so the the old 
> fixed len tuple is reused, but its strings and other var len data are not 
> released.
> This should be solved by  TopNNode::ReclaimTuplePool(), and tuple 
> reclamations did happen during the query:
>          - TuplePoolReclamations: 643 (643)
> But it is still strange that >1MB memory is needed and that the planner 
> underestimates the mem needs so much.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-13955) TopN node doesn't clean up pushed out varlen memory

Reply via email to