zhangqianqiong created IMPALA-13385:
---------------------------------------

             Summary: The response of ddl exceeds VM limit when serializing the 
catalog
                 Key: IMPALA-13385
                 URL: https://issues.apache.org/jira/browse/IMPALA-13385
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
    Affects Versions: Impala 4.4.1, Impala 4.4.0, Impala 4.3.0, Impala 4.1.2, 
Impala 4.1.1, Impala 4.2.0, Impala 4.1.0, Impala 4.0.0
            Reporter: zhangqianqiong
         Attachments: 企业微信截图_3c4cd519-c64b-45d1-b0f1-889fff752f62.png

         At precent, when catalogd responds to DDL operations, it sends the 
entire table object. This can lead to a massive transfer of table catalog when 
dealing with the  hive partitioned table. In one of our customer's clusters, 
there is a hive partitioned table with over 4,000 columns, more than 20,000 
partitions, and involving over 10 million hdfs files. When executing an `ALTER 
TABLE ADD PARTITION` operation on this table, the catalog being serialized for 
the table exceeds the java array size limit, resulting in the following 
exception: `java.long.OutOfMemoryError: Requested array size exceeds VM limit`.

        To alleviate the issue, we can use TCompactProtocol instead of 
TBinaryProtocol for thrift serialization. In an experiment with a hive table 
containing 160 partitions, I observed that using TCompactProtocol can reduce 
the serialized data size by 34.4% compared to the previous method.

        Here are potential solutions for addressing this issue:

DDL operations only: Use TCompactProtocol for serializing table catalog during 
ExecDdl operations. This would involve fewer changes but requires adjustments 
to JniUtil.

Global replacement with TCompactProtocol: Replace all serialization operations 
within Impala with TCompactProtocol. Although this a larger change, the overall 
code becomes cleaner. In 329 internal benchmark tests, I found no significant 
performance degradation compared to the previous implementation, and memory 
usage was reduce.

I would like to get some feedback from you.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to