zhangqianqiong created IMPALA-13438:
---------------------------------------
Summary: In alterTableRecoverPartitions, we should batch the
addHmsPartitions operations.
Key: IMPALA-13438
URL: https://issues.apache.org/jira/browse/IMPALA-13438
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Affects Versions: Impala 4.4.1, Impala 4.4.0, Impala 4.3.0, Impala 4.1.2,
Impala 4.1.1, Impala 4.2.0, Impala 4.1.0
Reporter: zhangqianqiong
Assignee: zhangqianqiong
After applying the merge request 'IMPALA-10502: Handle CREATE/DROP events
correctly', the {{alterTableRecoverPartitions}} method changed from batching
the {{add_partitions}} calls to invoking {{addHmsPartitions}} all at once.
However, for tables with a huge number of partitions, this can result in the
creation of a huge temporary object, {{{}List<Partitions>{}}}, leading to
OutOfMemory.
In my test environment, where the catalogd JVM {{Xmx}} was set to 2GB, running
the end-to-end test {{custom_cluster/test_wide_table_operations.py}} on a table
with 2000 columns and 50,000 partitions during the {{recover partitions}}
operation caused catalogd to run into a Java heap space
{{{}OutOfMemoryError{}}}.
An analysis of the memory dump using the MemoryAnalyzer revealed that the
temporary object contained a massive number of {{FieldSchema}} objects (2000
columns * 50,000 partitions), which overwhelmed memory resources.
To resolve this issue, we propose batching the {{addHmsPartitions}} calls,
ensuring that temporary objects are released after each batch operation. This
solution was tested and verified to resolve the {{{}OutOfMemoryError{}}},
ensuring system stability when handling a large number of partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)