[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629621#comment-16629621
 ] 

Paul Rogers commented on IMPALA-7501:
-------------------------------------

So the above was probably looking in the wrong haystack. Todd's comment is the 
key: {{LocalCatalog}}. The local catalog caches the HMS Thrift objects, 
including {{Partition}}.

The chain is:

* {{LocalDb}} contains a map of {LocalTable}}.
* {{LocalTable}} has a subclass {{LocalFsTable}} which contains a map of 
{{LocalPartitionSpec}} objects.
* {{LocalPartitionSpec}} has a relation (need to research) to 
{{LocalFsPartition}}.
* {{LocalFsPartition}} holds onto the Hive {{Partition}}, which holds onto the 
{{FieldSchema}} objects.

Short term, just need to track down how we cache the {{Partition}} and nuke the 
{{FieldSchema}}, then retest.

Longer term, the note earlier does apply. While the query-specific metadata 
goes to pains to avoid caching HMS objects, LocalCatalog (and presumably the 
similar version in the {{catalogd}} do cache HMS objects which, as noted 
earlier, are rather bloated for our needs.

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to