Quanlong Huang created IMPALA-11580:
---------------------------------------

             Summary: Memory leak in legacy catalog mode when applying 
incremental partition updates
                 Key: IMPALA-11580
                 URL: https://issues.apache.org/jira/browse/IMPALA-11580
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 4.1.0, Impala 4.0.0
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


Since IMPALA-3127, catalogd propagates incremental metadata updates in 
partition level. In the legacy catalog mode, while applying the updates, 
impalad reuses the existing partition objects and move them to a new HdfsTable 
object. However, the partition objects are immutable, which means their 
reference to the old table object remain unchanged. JVM cannot collect the 
stale table objects since they still have active reference from the partitions.

To reproduce the issue, create a partitioned table and add new partitions to it 
in a rate closer to the catalog update frequency (2s by default):
{code:sql}
impala-shell> drop table if exists my_part_tbl;
impala-shell> create external table my_part_tbl (id int) partitioned by (p int) 
stored as textfile;
{code}
Add a partition every 2s:
{code:bash}
for i in `seq 1000`; do impala-shell.sh -q "alter table my_part_tbl add 
partition (p=$i)"; sleep 2; done
{code}
Then monitor the live table objects in impalad JVM:
{code:bash}
for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 
'org.apache.impala.catalog.HdfsTable$'; done
{code}
You can see that only one impalad has the value unchanged. The number in the 
other 2 impalads keep bumping.
{noformat}
$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
 136:            14           3360  org.apache.impala.catalog.HdfsTable
PID=27671
 136:            14           3360  org.apache.impala.catalog.HdfsTable
PID=27668
 474:             1            240  org.apache.impala.catalog.HdfsTable

$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep 
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
 113:            21           5040  org.apache.impala.catalog.HdfsTable
PID=27671
 113:            21           5040  org.apache.impala.catalog.HdfsTable
PID=27668
 474:             1            240  org.apache.impala.catalog.HdfsTable
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to