[ https://issues.apache.org/jira/browse/IMPALA-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882112#comment-17882112 ]
Riza Suminto commented on IMPALA-13385: --------------------------------------- Hello [~qqzhang] , thank you for reporting this issue. I wonder, do you use local catalog mode or not (see this [docs|https://impala.apache.org/docs/build/html/topics/impala_metadata.html])? We continue to make improvement to reduce Catalog object size. One that I remember is IMPALA-7501, which first landed in Impala 4.1.0. There are maybe numerous more improvement in the latest Impala 4.4.1. Have you try them? Considering the large number of columns and partitions, migrating it to Iceberg format might yield a better performance as well. cc: [~stigahuang] , [~boroknagyz] > The response of ddl exceeds VM limit when serializing the catalog > ----------------------------------------------------------------- > > Key: IMPALA-13385 > URL: https://issues.apache.org/jira/browse/IMPALA-13385 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, > Impala 4.1.2, Impala 4.3.0, Impala 4.4.0, Impala 4.4.1 > Reporter: zhangqianqiong > Priority: Major > Attachments: 企业微信截图_3c4cd519-c64b-45d1-b0f1-889fff752f62.png > > > At precent, when catalogd responds to DDL operations, it sends the > entire table object. This can lead to a massive transfer of table catalog > when dealing with the hive partitioned table. In one of our customer's > clusters, there is a hive partitioned table with over 4,000 columns, more > than 20,000 partitions, and involving over 10 million hdfs files. When > executing an `ALTER TABLE ADD PARTITION` operation on this table, the catalog > being serialized for the table exceeds the java array size limit, resulting > in the following exception: `java.long.OutOfMemoryError: Requested array size > exceeds VM limit`. > To alleviate the issue, we can use TCompactProtocol instead of > TBinaryProtocol for thrift serialization. In an experiment with a hive table > containing 160 partitions, I observed that using TCompactProtocol can reduce > the serialized data size by 34.4% compared to the previous method. > Here are potential solutions for addressing this issue: > DDL operations only: Use TCompactProtocol for serializing table > catalog during ExecDdl operations. This would involve fewer changes but > requires adjustments to JniUtil. > Global replacement with TCompactProtocol: Replace all serialization > operations within Impala with TCompactProtocol. Although this a larger > change, the overall code becomes cleaner. In 329 internal benchmark tests, I > found no significant performance degradation compared to the previous > implementation, and memory usage was reduce. > Looking forward to any feedback. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org