[ https://issues.apache.org/jira/browse/IMPALA-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885658#comment-17885658 ]
zhangqianqiong commented on IMPALA-13385: ----------------------------------------- Thank [~stigahuang] for your feedback. I will extend this approach to updateCatalog and execResetMetadata as per your suggesting. I agree that IMPALA-9937 provides a comprehensive solution to this issue. Moving forward, we will also begin work on enabling the local-catalog mode, which will be a long-term project. > The response of ddl exceeds VM limit when serializing the catalog > ----------------------------------------------------------------- > > Key: IMPALA-13385 > URL: https://issues.apache.org/jira/browse/IMPALA-13385 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, > Impala 4.1.2, Impala 4.3.0, Impala 4.4.0, Impala 4.4.1 > Reporter: zhangqianqiong > Assignee: zhangqianqiong > Priority: Major > Attachments: 企业微信截图_3c4cd519-c64b-45d1-b0f1-889fff752f62.png > > > At precent, when catalogd responds to DDL operations, it sends the > entire table object. This can lead to a massive transfer of table catalog > when dealing with the hive partitioned table. In one of our customer's > clusters, there is a hive partitioned table with over 4,000 columns, more > than 20,000 partitions, and involving over 10 million hdfs files. When > executing an `ALTER TABLE ADD PARTITION` operation on this table, the catalog > being serialized for the table exceeds the java array size limit, resulting > in the following exception: `java.long.OutOfMemoryError: Requested array size > exceeds VM limit`. > To alleviate the issue, we can use TCompactProtocol instead of > TBinaryProtocol for thrift serialization. In an experiment with a hive table > containing 160 partitions, I observed that using TCompactProtocol can reduce > the serialized data size by 34.4% compared to the previous method. > Here are potential solutions for addressing this issue: > DDL operations only: Use TCompactProtocol for serializing table > catalog during ExecDdl operations. This would involve fewer changes but > requires adjustments to JniUtil. > Global replacement with TCompactProtocol: Replace all serialization > operations within Impala with TCompactProtocol. Although this a larger > change, the overall code becomes cleaner. In 329 internal benchmark tests, I > found no significant performance degradation compared to the previous > implementation, and memory usage was reduce. > Looking forward to any feedback. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org