[ 
https://issues.apache.org/jira/browse/IMPALA-11812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11812.
-------------------------------------
    Fix Version/s: Impala 4.3.0
       Resolution: Fixed

Resolving this. Thank [~amansinha] and [~sql_forever] for the review!

> Catalogd OOM due to lots of HMS FieldSchema instances
> -----------------------------------------------------
>
>                 Key: IMPALA-11812
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11812
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>         Attachments: MAT_dominator_tree.png, 
> create_ext_tbl_with_5k_cols_50k_parts.sh, 
> wide-table-loading-oom-FieldSchema-path2root.png, 
> wide-table-loading-oom-histogram.png, wide-table-loading-oom-top-consumers.png
>
>
> For partitioned wide tables that have thousands of columns, catalogd might 
> hit OOM in routines on them. E.g. when running AlterTableRecoverPartitions 
> for all their partitions, or when initially loading all partitions of them.
> The direct reason is that the heap is full of HMS FieldSchema instances. Here 
> is a histogram of the issue in a 4GB heap:
> {noformat}
> Class Name                                                   |     Objects |  
> Shallow Heap |
> --------------------------------------------------------------------------------------------
> org.apache.hadoop.hive.metastore.api.FieldSchema             | 111,876,486 | 
> 2,685,035,664 |
> java.lang.Object[]                                           |      78,026 |  
>  449,929,656 |
> char[]                                                       |      91,295 |  
>    6,241,744 |
> java.util.ArrayList                                          |     171,126 |  
>    4,107,024 |
> java.util.HashMap                                            |      71,135 |  
>    3,414,480 |
> java.lang.String                                             |      91,161 |  
>    2,187,864 |
> java.util.concurrent.ConcurrentHashMap$Node                  |      59,614 |  
>    1,907,648 |
> java.util.concurrent.atomic.LongAdder                        |      53,021 |  
>    1,696,672 |
> org.apache.hadoop.hive.metastore.api.Partition               |      22,374 |  
>    1,610,928 |
> com.codahale.metrics.EWMA                                    |      30,780 |  
>    1,477,440 |
> com.codahale.metrics.LongAdderProxy$JdkProvider$1            |      53,021 |  
>    1,272,504 |
> org.apache.hadoop.hive.metastore.api.StorageDescriptor       |      22,376 |  
>    1,253,056 |
> java.util.Hashtable$Entry                                    |      36,921 |  
>    1,181,472 |
> java.util.concurrent.atomic.AtomicLong                       |      39,444 |  
>      946,656 |
> org.apache.hadoop.hive.metastore.api.SerDeInfo               |      22,375 |  
>      895,000 |
> byte[]                                                       |       1,686 |  
>      668,480 |
> java.util.concurrent.ConcurrentHashMap$Node[]                |       1,874 |  
>      639,824 |
> com.codahale.metrics.ExponentiallyDecayingReservoir          |      10,259 |  
>      574,504 |
> java.util.HashMap$Node                                       |      17,776 |  
>      568,832 |
> org.apache.hadoop.hive.metastore.api.SkewedInfo              |      22,375 |  
>      537,000 |
> com.codahale.metrics.Meter                                   |      10,260 |  
>      492,480 |
> java.util.concurrent.ConcurrentSkipListMap                   |      10,260 |  
>      492,480 |
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync|      10,259 |  
>      492,432 |
> org.apache.impala.catalog.ColumnStats                        |       5,003 |  
>      400,240 |                 
> Total: 24 of 6,158 entries; 6,130 more                       | 113,007,927 | 
> 3,174,330,288 | {noformat}
> In the above case, these FieldSchema instances come from the list of 
> hmsPartitions that is created locally by 
> CatalogOpExecutor#alterTableRecoverPartitions(). The thread is 0x6d051abb8:
> !MAT_dominator_tree.png|width=825,height=308!
> Stacktrace:
> {noformat}
> Thread 0x6d051abb8
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.<init>(Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;)V
>  (StorageDescriptor.java:216)
>   at 
> org.apache.hadoop.hive.metastore.api.StorageDescriptor.deepCopy()Lorg/apache/hadoop/hive/metastore/api/StorageDescriptor;
>  (StorageDescriptor.java:256)
>   at 
> org.apache.impala.service.CatalogOpExecutor.createHmsPartitionFromValues(Ljava/util/List;Lorg/apache/hadoop/hive/metastore/api/Table;Lorg/apache/impala/analysis/TableName;Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Partition;
>  (CatalogOpExecutor.java:5787)
>   at 
> org.apache.impala.service.CatalogOpExecutor.alterTableRecoverPartitions(Lorg/apache/impala/catalog/Table;Ljava/lang/String;)V
>  (CatalogOpExecutor.java:5678)
>   at 
> org.apache.impala.service.CatalogOpExecutor.alterTable(Lorg/apache/impala/thrift/TAlterTableParams;Ljava/lang/String;ZLorg/apache/impala/thrift/TDdlExecResponse;)V
>  (CatalogOpExecutor.java:1208)
>   at 
> org.apache.impala.service.CatalogOpExecutor.execDdlRequest(Lorg/apache/impala/thrift/TDdlExecRequest;)Lorg/apache/impala/thrift/TDdlExecResponse;
>  (CatalogOpExecutor.java:419)
>   at org.apache.impala.service.JniCatalog.execDdl([B)[B 
> (JniCatalog.java:260){noformat}
> *How this happen*
> When creating the list of hmsPartitions, we deep copy the StorageDescriptor 
> which will also deep copy the column list:
> {code:java}
> alterTableRecoverPartitions()
> -> createHmsPartitionFromValues()
>    -> StorageDescriptor sd = msTbl.getSd().deepCopy();{code}
> Impala doesn't respect the partition level schema (by design), we should 
> share the list of FieldSchema across hmsPartitions.
> When loading partition metadata for such a table, we could also hit this 
> issue. The HMS API "get_partitions_by_names" returns the list of 
> hmsPartitions. Each of them reference a unique list of FieldSchemas. We 
> should deduplicate them to share the same column list. FWIW, attached the 
> heap analysis results for wide table loading OOM (4GB heap):
> !wide-table-loading-oom-histogram.png|width=623,height=695!
> !wide-table-loading-oom-top-consumers.png|width=1066,height=434!
> The FieldSchema instances come from the metadata loading thread:
> !wide-table-loading-oom-FieldSchema-path2root.png|width=1023,height=213!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to