Quanlong Huang created IMPALA-11812:
---------------------------------------
Summary: Catalogd OOM due to lots of HMS FieldSchema instances
Key: IMPALA-11812
URL: https://issues.apache.org/jira/browse/IMPALA-11812
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Attachments: MAT_dominator_tree.png
For partitioned wide tables that have thousands of columns, catalogd might hit
OOM in routines on them. E.g. when running AlterTableRecoverPartitions for all
their partitions, or when initially loading all partitions of them.
The direct reason is that the heap is full of HMS FieldSchema instances. Here
is a histogram of the issue in a 4GB heap:
{noformat}
Class Name | Objects |
Shallow Heap |
--------------------------------------------------------------------------------------------
org.apache.hadoop.hive.metastore.api.FieldSchema | 111,876,486 |
2,685,035,664 |
java.lang.Object[] | 78,026 |
449,929,656 |
char[] | 91,295 |
6,241,744 |
java.util.ArrayList | 171,126 |
4,107,024 |
java.util.HashMap | 71,135 |
3,414,480 |
java.lang.String | 91,161 |
2,187,864 |
java.util.concurrent.ConcurrentHashMap$Node | 59,614 |
1,907,648 |
java.util.concurrent.atomic.LongAdder | 53,021 |
1,696,672 |
org.apache.hadoop.hive.metastore.api.Partition | 22,374 |
1,610,928 |
com.codahale.metrics.EWMA | 30,780 |
1,477,440 |
com.codahale.metrics.LongAdderProxy$JdkProvider$1 | 53,021 |
1,272,504 |
org.apache.hadoop.hive.metastore.api.StorageDescriptor | 22,376 |
1,253,056 |
java.util.Hashtable$Entry | 36,921 |
1,181,472 |
java.util.concurrent.atomic.AtomicLong | 39,444 |
946,656 |
org.apache.hadoop.hive.metastore.api.SerDeInfo | 22,375 |
895,000 |
byte[] | 1,686 |
668,480 |
java.util.concurrent.ConcurrentHashMap$Node[] | 1,874 |
639,824 |
com.codahale.metrics.ExponentiallyDecayingReservoir | 10,259 |
574,504 |
java.util.HashMap$Node | 17,776 |
568,832 |
org.apache.hadoop.hive.metastore.api.SkewedInfo | 22,375 |
537,000 |
com.codahale.metrics.Meter | 10,260 |
492,480 |
java.util.concurrent.ConcurrentSkipListMap | 10,260 |
492,480 |
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync| 10,259 |
492,432 |
org.apache.impala.catalog.ColumnStats | 5,003 |
400,240 |
Total: 24 of 6,158 entries; 6,130 more | 113,007,927 |
3,174,330,288 | {noformat}
In the above case, these FieldSchema instances come from the list of
hmsPartitions that is created locally by
CatalogOpExecutor#alterTableRecoverPartitions(). The thread is 0x6d051abb8:
Stacktrace:
{noformat}
Thread 0x6d051abb8
at
org.apache.hadoop.hive.metastore.api.StorageDescriptor.<init>(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
(StorageDescriptor.java:216)
at
org.apache.hadoop.hive.metastore.api.StorageDescriptor.deepCopy()Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;
(StorageDescriptor.java:256)
at
org.apache.impala.service.CatalogOpExecutor.createHmsPartitionFromValues(Ljava/util/List;Lorg/apache/hadoop/hive/metastore/api/Table;Lorg/apache/impala/analysis/TableName;Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Partition;
(CatalogOpExecutor.java:5787)
at
org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(Lorg/apache/impala/catalog/Table;Ljava/lang/String;)V
(CatalogOpExecutor.java:5678)
at
org.apache.impala.service.CatalogOpExecutor.alterTable(Lorg/apache/impala/thrift/TAlterTableParams;Ljava/lang/String;ZLorg/apache/impala/thrift/TDdlExecResponse;)V
(CatalogOpExecutor.java:1208)
at
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(Lorg/apache/impala/thrift/TDdlExecRequest;)Lorg/apache/impala/thrift/TDdlExecResponse;
(CatalogOpExecutor.java:419)
at org.apache.impala.service.JniCatalog.execDdl([B)[B
(JniCatalog.java:260){noformat}
*How this happen*
When creating the list of hmsPartitions, we deep copy the StorageDescriptor
which will also deep copy the column list:
{code:java}
alterTableRecoverPartitions()
-> createHmsPartitionFromValues()
-> StorageDescriptor sd = msTbl.getSd().deepCopy();{code}
Impala doesn't respect the partition level schema (by design), we should share
the list of FieldSchema across hmsPartitions.
When loading partition metadata for such a table, we could also hit this issue.
The HMS API "get_partitions_by_names" returns the list of hmsPartitions. Each
of them reference a unique list of FieldSchemas. We should deduplicate them to
share the same column list.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]