[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629554#comment-16629554
 ] 

Paul Rogers commented on IMPALA-7501:
-------------------------------------

Analysis:

* Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects.
* Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} 
objects.
* Impala's {{LocalTable}} contains a Hive {{Table}} object.
* The {{Table}} object is defined in [Hive's Thrift 
schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift]
 API. It does not contain a list of partitions.

Things are a bit confusing because:

* Hive defines a different 
[{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java]
 class, which contains a {{TableSpec}}.
* Hive's 
[{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java]
 contains a list of {{Partition}} objects.
* Hive's 
[{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java]
 is generated from Thrift. Contains a {{StorageDescriptor}}.
* Hive's 
[{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java]
 contains the list of {{FieldSchema}} objects which Todd saw in the heap dump.

The above says that, yes, Hive {{Partition}} objects do hold a list of 
{{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} 
object. Perhaps we cache {{Partition}} objects in the table schema:

Impala loads tables in the background by calling {{HdfsTable.load()}}:

* The {{LocalTable}} wraps a number of subclass, of which the one of interest 
is {{HdfsTable}}.
* {{load()}} calls {{loadAllPartitions()}} to do the partition work.
* {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the 
partitions as a list of Hive {{Partition} objects.
* {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls 
{{addPartition}} to put the partition into a couple of maps.
* But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s 
{{Partition}} object without holding onto Hive’s object.

So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. 
Still, there are references, so the question is: where?

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to