[
https://issues.apache.org/jira/browse/IMPALA-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-11580:
------------------------------------
Description:
Since IMPALA-3127, catalogd propagates incremental metadata updates in
partition level. In the legacy catalog mode, while applying the updates,
impalad reuses the existing partition objects and move them to a new HdfsTable
object. However, the partition objects are immutable, which means their
reference to the old table object remain unchanged. JVM cannot collect the
stale table objects since they still have active reference from the partitions.
To reproduce the issue, create a partitioned table and add new partitions to it
in a rate closer to the catalog update frequency (2s by default):
{code:sql}
impala-shell> drop table if exists my_part_tbl;
impala-shell> create external table my_part_tbl (id int) partitioned by (p int)
stored as textfile;
{code}
Add a partition every 2s:
{code:bash}
for i in `seq 1000`; do impala-shell.sh -q "alter table my_part_tbl add
partition (p=$i)"; sleep 2; done
{code}
Then monitor the live table objects in impalad JVM:
{code:bash}
for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
{code}
You can see that only one impalad has the value unchanged. The number in the
other 2 impalads keep bumping.
{noformat}
$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
136: 14 3360 org.apache.impala.catalog.HdfsTable
PID=27671
136: 14 3360 org.apache.impala.catalog.HdfsTable
PID=27668
474: 1 240 org.apache.impala.catalog.HdfsTable
$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
113: 21 5040 org.apache.impala.catalog.HdfsTable
PID=27671
113: 21 5040 org.apache.impala.catalog.HdfsTable
PID=27668
474: 1 240 org.apache.impala.catalog.HdfsTable
{noformat}
This only happens in the legacy catalog mode and doesn't occur in the
local-catalog mode. To workaround this, use the startup flag
{{--enable_incremental_metadata_updates}} in catalogd to disable incremental
catalog updates.
was:
Since IMPALA-3127, catalogd propagates incremental metadata updates in
partition level. In the legacy catalog mode, while applying the updates,
impalad reuses the existing partition objects and move them to a new HdfsTable
object. However, the partition objects are immutable, which means their
reference to the old table object remain unchanged. JVM cannot collect the
stale table objects since they still have active reference from the partitions.
To reproduce the issue, create a partitioned table and add new partitions to it
in a rate closer to the catalog update frequency (2s by default):
{code:sql}
impala-shell> drop table if exists my_part_tbl;
impala-shell> create external table my_part_tbl (id int) partitioned by (p int)
stored as textfile;
{code}
Add a partition every 2s:
{code:bash}
for i in `seq 1000`; do impala-shell.sh -q "alter table my_part_tbl add
partition (p=$i)"; sleep 2; done
{code}
Then monitor the live table objects in impalad JVM:
{code:bash}
for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
{code}
You can see that only one impalad has the value unchanged. The number in the
other 2 impalads keep bumping.
{noformat}
$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
136: 14 3360 org.apache.impala.catalog.HdfsTable
PID=27671
136: 14 3360 org.apache.impala.catalog.HdfsTable
PID=27668
474: 1 240 org.apache.impala.catalog.HdfsTable
$ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
'org.apache.impala.catalog.HdfsTable$'; done
PID=27677
113: 21 5040 org.apache.impala.catalog.HdfsTable
PID=27671
113: 21 5040 org.apache.impala.catalog.HdfsTable
PID=27668
474: 1 240 org.apache.impala.catalog.HdfsTable
{noformat}
> Memory leak in legacy catalog mode when applying incremental partition updates
> ------------------------------------------------------------------------------
>
> Key: IMPALA-11580
> URL: https://issues.apache.org/jira/browse/IMPALA-11580
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.0.0, Impala 4.1.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> Since IMPALA-3127, catalogd propagates incremental metadata updates in
> partition level. In the legacy catalog mode, while applying the updates,
> impalad reuses the existing partition objects and move them to a new
> HdfsTable object. However, the partition objects are immutable, which means
> their reference to the old table object remain unchanged. JVM cannot collect
> the stale table objects since they still have active reference from the
> partitions.
> To reproduce the issue, create a partitioned table and add new partitions to
> it in a rate closer to the catalog update frequency (2s by default):
> {code:sql}
> impala-shell> drop table if exists my_part_tbl;
> impala-shell> create external table my_part_tbl (id int) partitioned by (p
> int) stored as textfile;
> {code}
> Add a partition every 2s:
> {code:bash}
> for i in `seq 1000`; do impala-shell.sh -q "alter table my_part_tbl add
> partition (p=$i)"; sleep 2; done
> {code}
> Then monitor the live table objects in impalad JVM:
> {code:bash}
> for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
> 'org.apache.impala.catalog.HdfsTable$'; done
> {code}
> You can see that only one impalad has the value unchanged. The number in the
> other 2 impalads keep bumping.
> {noformat}
> $ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
> 'org.apache.impala.catalog.HdfsTable$'; done
> PID=27677
> 136: 14 3360 org.apache.impala.catalog.HdfsTable
> PID=27671
> 136: 14 3360 org.apache.impala.catalog.HdfsTable
> PID=27668
> 474: 1 240 org.apache.impala.catalog.HdfsTable
> $ for p in `pidof impalad`; do echo PID=$p; jmap -histo:live $p | grep
> 'org.apache.impala.catalog.HdfsTable$'; done
> PID=27677
> 113: 21 5040 org.apache.impala.catalog.HdfsTable
> PID=27671
> 113: 21 5040 org.apache.impala.catalog.HdfsTable
> PID=27668
> 474: 1 240 org.apache.impala.catalog.HdfsTable
> {noformat}
> This only happens in the legacy catalog mode and doesn't occur in the
> local-catalog mode. To workaround this, use the startup flag
> {{--enable_incremental_metadata_updates}} in catalogd to disable incremental
> catalog updates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]