[
https://issues.apache.org/jira/browse/IMPALA-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhangqianqiong reassigned IMPALA-13385:
---------------------------------------
Assignee: zhangqianqiong
> The response of ddl exceeds VM limit when serializing the catalog
> -----------------------------------------------------------------
>
> Key: IMPALA-13385
> URL: https://issues.apache.org/jira/browse/IMPALA-13385
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1,
> Impala 4.1.2, Impala 4.3.0, Impala 4.4.0, Impala 4.4.1
> Reporter: zhangqianqiong
> Assignee: zhangqianqiong
> Priority: Major
> Attachments: 企业微信截图_3c4cd519-c64b-45d1-b0f1-889fff752f62.png
>
>
> At precent, when catalogd responds to DDL operations, it sends the
> entire table object. This can lead to a massive transfer of table catalog
> when dealing with the hive partitioned table. In one of our customer's
> clusters, there is a hive partitioned table with over 4,000 columns, more
> than 20,000 partitions, and involving over 10 million hdfs files. When
> executing an `ALTER TABLE ADD PARTITION` operation on this table, the catalog
> being serialized for the table exceeds the java array size limit, resulting
> in the following exception: `java.long.OutOfMemoryError: Requested array size
> exceeds VM limit`.
> To alleviate the issue, we can use TCompactProtocol instead of
> TBinaryProtocol for thrift serialization. In an experiment with a hive table
> containing 160 partitions, I observed that using TCompactProtocol can reduce
> the serialized data size by 34.4% compared to the previous method.
> Here are potential solutions for addressing this issue:
> DDL operations only: Use TCompactProtocol for serializing table
> catalog during ExecDdl operations. This would involve fewer changes but
> requires adjustments to JniUtil.
> Global replacement with TCompactProtocol: Replace all serialization
> operations within Impala with TCompactProtocol. Although this a larger
> change, the overall code becomes cleaner. In 329 internal benchmark tests, I
> found no significant performance degradation compared to the previous
> implementation, and memory usage was reduce.
> Looking forward to any feedback.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]