Quanlong Huang created IMPALA-10686:
---------------------------------------

             Summary: Add threshold for catalogd to give up loading large tables
                 Key: IMPALA-10686
                 URL: https://issues.apache.org/jira/browse/IMPALA-10686
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Quanlong Huang


Catalogd could hit the 2GB array size limit of JVM when serializing a large 
HdfsTable object, which throws an OOM error. Although catalogd sends partition 
metadata individually outside the table object in catalog updates after 
IMPALA-3127, it still sends the whole table object in DDL/DML/Refresh 
responses. IMPALA-9937 aims to fix this. But we tend to fix the problem via 
local catalog mode. So IMPALA-9937 is in low priority.

Due to this, it would be helpful for users that still using the legacy catalog 
mode, to have a configurable threshold to avoid catalogd loading metadata of a 
large table.

We can provide thresholds in number of partitions/files or the estimated 
metadata size of the whole table. Catalogd should give up loading the table 
metadata if any of them exceeds the threshold.

Note that a simpler workaround is using the {{--blacklisted_dbs}} and 
{{--blacklisted_tables}} flags to disable such kinds of tables directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to