[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

Quanlong Huang (Jira) Thu, 20 May 2021 19:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17348942#comment-17348942
 ]


Quanlong Huang commented on IMPALA-7501:
----------------------------------------

To measure the benifits of this work on the memory usage, I did a heap analysis 
on the case I mentioned above, i.e. a table with 478 columns and 87320 
partitions (1 non-empty file per partition). The total heap usage is 1.1 GB. 
There are 41M objects in it. The dominator is the column list (i.e. 
list<FieldSchema>) included in the StorageDescriptor of each partition:
{code:java}
Class Name                                            |    Objects |  Shallow 
Heap | Retained Heap
---------------------------------------------------------------------------------------------------
org.apache.hadoop.hive.metastore.api.FieldSchema      | 38,130,538 |   
915,132,912 |              
java.lang.Object[]                                    |    164,431 |   
157,234,704 |              
char[]                                                |    436,556 |    
41,490,952 |              
byte[]                                                |     81,461 |    
13,026,696 |              
java.util.HashMap                                     |    240,230 |    
11,531,040 |              
java.lang.String                                      |    436,420 |    
10,474,080 |              
java.util.ArrayList                                   |    319,912 |     
7,677,888 |              
org.apache.hadoop.hive.metastore.api.Partition        |     79,771 |     
5,743,512 |              
org.apache.hadoop.hive.metastore.api.StorageDescriptor|     79,771 |     
4,467,176 |              
com.google.common.cache.LocalCache$StrongAccessEntry  |     79,822 |     
3,831,456 |              
java.nio.HeapByteBuffer                               |     79,777 |     
3,829,296 |              
Total: 11 of 6,920 entries; 6,909 more                | 41,029,952 | 
1,200,677,088 |              
---------------------------------------------------------------------------------------------------
{code}
Using the dominator_tree analysis of MAT, the result proves these FieldSchema 
objects come from the partition metadata:
{code:java}
Class Name                                                                      
                                 | Shallow Heap | Retained Heap | Percentage
------------------------------------------------------------------------------------------------------------------------------------------------------------
class org.apache.impala.service.FeCatalogManager$LocalImpl @ 0x5cce8e308        
                                 |            8 | 1,187,171,224 |     98.88%
'- org.apache.impala.catalog.local.CatalogdMetaProvider @ 0x5cd26a700           
                                 |           72 | 1,187,171,216 |     98.88%
   |- com.google.common.cache.LocalCache$LocalManualCache @ 0x5cd26a800         
                                 |           16 | 1,187,169,424 |     98.87%
   |  '- com.google.common.cache.LocalCache @ 0x5cd26a810                       
                                 |          128 | 1,187,169,408 |     98.87%
   |     |- com.google.common.cache.LocalCache$Segment[4] @ 0x5cd26a890         
                                 |           32 | 1,187,169,032 |     98.87%
   |     |  |- com.google.common.cache.LocalCache$Segment @ 0x5cd26aa58         
                                 |           80 |   296,800,352 |     24.72%
   |     |  |  |- java.util.concurrent.atomic.AtomicReferenceArray @ 
0x60f98aad0                                 |           16 |       131,104 |    
  0.01%
   |     |  |  |- com.google.common.cache.LocalCache$StrongAccessEntry @ 
0x610d3b750                             |           48 |        14,968 |      
0.00%
   |     |  |  |  |- 
com.google.common.cache.LocalCache$WeightedStrongValueReference @ 0x610d3b798   
            |           24 |        14,896 |      0.00%
   |     |  |  |  |  '- 
org.apache.impala.catalog.local.CatalogdMetaProvider$PartitionMetadataImpl @ 
0x610d3b7b0 |           40 |        14,872 |      0.00%
   |     |  |  |  |     |- org.apache.hadoop.hive.metastore.api.Partition @ 
0x5fea12cf0                          |           72 |        14,576 |      0.00%
   |     |  |  |  |     |  |- 
org.apache.hadoop.hive.metastore.api.StorageDescriptor @ 0x5fea12e80            
   |           56 |        14,072 |      0.00%
   |     |  |  |  |     |  |  |- java.util.ArrayList @ 0x5fea12eb8              
                                 |           24 |        13,424 |      0.00%
   |     |  |  |  |     |  |  |  '- java.lang.Object[478] @ 0x5fea220f0         
                                 |        1,928 |        13,400 |      0.00%
   |     |  |  |  |     |  |  |     |- 
org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea22878            |     
      24 |            24 |      0.00%
   |     |  |  |  |     |  |  |     |- 
org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea22890            |     
      24 |            24 |      0.00%
   |     |  |  |  |     |  |  |     |- 
org.apache.hadoop.hive.metastore.api.FieldSchema @ 0x5fea228a8            |     
      24 |            24 |      0.00%
{code}
Each FieldSchema object take 24 bytes, and there are 38,130,538 FieldSchema 
objects. They finally consume 76% of the 1.1GB heap space.

Note that the uncompressed location strings and input/outputFormat strings of 
each partition also take some space.

I plan to move on the following optimization:
 * Don't cache msPartition object in CatalogdMetaProvider. Replace it with the 
actual fields we need, including
 ** hms parameters
 ** write id 
 ** HdfsStorageDescriptor which replaces the Input/OutputFormat strings with 
enums and contains sufficient info like lineDelimiter, fieldDelimiter, 
blockSize etc.
 ** HdfsPartitionLocationCompressor$Location which prefix compresses the 
partition location strings.
 * Don't transmit msPartition object in TPartitialPartitionInfo. Replace it 
with the fields mentioned above.

These can be done together. The first one reduce memory usage of the 
coordinator. The second one reduce the thrift object size that transfers from 
catalogd to coordinator, which is now easily causing OOM error for exceeding 
java array size limit (2GB).

CC [~vihangk1], [~amansinha]

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-v2
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-7501) Slim down metastore Partition objects in LocalCatalog cache

Reply via email to