[ 
https://issues.apache.org/jira/browse/IMPALA-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885658#comment-17885658
 ] 

zhangqianqiong commented on IMPALA-13385:
-----------------------------------------

Thank [~stigahuang] for your feedback. I will extend this approach to 
updateCatalog and execResetMetadata as per your suggesting.

I agree that IMPALA-9937 provides a comprehensive solution to this issue. 
Moving forward, we will also begin work on enabling the local-catalog mode,  
which will be a long-term project.

 

> The response of ddl exceeds VM limit when serializing the catalog
> -----------------------------------------------------------------
>
>                 Key: IMPALA-13385
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13385
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, 
> Impala 4.1.2, Impala 4.3.0, Impala 4.4.0, Impala 4.4.1
>            Reporter: zhangqianqiong
>            Assignee: zhangqianqiong
>            Priority: Major
>         Attachments: 企业微信截图_3c4cd519-c64b-45d1-b0f1-889fff752f62.png
>
>
>          At precent, when catalogd responds to DDL operations, it sends the 
> entire table object. This can lead to a massive transfer of table catalog 
> when dealing with the  hive partitioned table. In one of our customer's 
> clusters, there is a hive partitioned table with over 4,000 columns, more 
> than 20,000 partitions, and involving over 10 million hdfs files. When 
> executing an `ALTER TABLE ADD PARTITION` operation on this table, the catalog 
> being serialized for the table exceeds the java array size limit, resulting 
> in the following exception: `java.long.OutOfMemoryError: Requested array size 
> exceeds VM limit`.
>         To alleviate the issue, we can use TCompactProtocol instead of 
> TBinaryProtocol for thrift serialization. In an experiment with a hive table 
> containing 160 partitions, I observed that using TCompactProtocol can reduce 
> the serialized data size by 34.4% compared to the previous method.
>         Here are potential solutions for addressing this issue:
>         DDL operations only: Use TCompactProtocol for serializing table 
> catalog during ExecDdl operations. This would involve fewer changes but 
> requires adjustments to JniUtil.
>         Global replacement with TCompactProtocol: Replace all serialization 
> operations within Impala with TCompactProtocol. Although this a larger 
> change, the overall code becomes cleaner. In 329 internal benchmark tests, I 
> found no significant performance degradation compared to the previous 
> implementation, and memory usage was reduce.
>        Looking forward to any feedback.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to