[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317134#comment-17317134
 ] 

Quanlong Huang commented on IMPALA-7501:
----------------------------------------

For the unused fields, I think we should null them out when generating 
TGetPartialCatalogObjectResponse in catalogd. This reduces the memory pressure 
on both side.

I did an experiment on a table with 478 columns and 87320 partitions (1 file 
per partition). When fetching all partitions in one GetPartialCatalogObject() 
call, the serialized response size is 1823012484 (1.7GB). However, in the 
legacy catalog mode, when executing REFRESH on the table, the serialized size 
of TResetMetadataResponse which contains the whole table object is just 
71390662 (68MB).

One factor is these unused string fields in hms partitions. The other factor is 
the partition locations in legacy catalog mode is prefix compressed. In hms 
partitions, the locations are all full URIs.

cc [~vihangk1]

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Todd Lipcon
>            Assignee: Quanlong Huang
>            Priority: Minor
>              Labels: catalog-v2
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to