Quanlong Huang created IMPALA-10686:
---------------------------------------
Summary: Add threshold for catalogd to give up loading large tables
Key: IMPALA-10686
URL: https://issues.apache.org/jira/browse/IMPALA-10686
Project: IMPALA
Issue Type: New Feature
Reporter: Quanlong Huang
Catalogd could hit the 2GB array size limit of JVM when serializing a large
HdfsTable object, which throws an OOM error. Although catalogd sends partition
metadata individually outside the table object in catalog updates after
IMPALA-3127, it still sends the whole table object in DDL/DML/Refresh
responses. IMPALA-9937 aims to fix this. But we tend to fix the problem via
local catalog mode. So IMPALA-9937 is in low priority.
Due to this, it would be helpful for users that still using the legacy catalog
mode, to have a configurable threshold to avoid catalogd loading metadata of a
large table.
We can provide thresholds in number of partitions/files or the estimated
metadata size of the whole table. Catalogd should give up loading the table
metadata if any of them exceeds the threshold.
Note that a simpler workaround is using the {{--blacklisted_dbs}} and
{{--blacklisted_tables}} flags to disable such kinds of tables directly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]