[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629554#comment-16629554 ]
Paul Rogers commented on IMPALA-7501: ------------------------------------- Analysis: * Impala's {{LocalCatalog}} contains a list of {{FeDb}} objects. * Impala's {{LocalDb}}, which extends {{FeDb}} contains a map of {{LocalTable}} objects. * Impala's {{LocalTable}} contains a Hive {{Table}} object. * The {{Table}} object is defined in [Hive's Thrift schema|https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift] API. It does not contain a list of partitions. Things are a bit confusing because: * Hive defines a different [{{Table}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java] class, which contains a {{TableSpec}}. * Hive's [{{TableSpec}}|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java] contains a list of {{Partition}} objects. * Hive's [{{Partition}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java] is generated from Thrift. Contains a {{StorageDescriptor}}. * Hive's [{{StorageDescriptor}}|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java] contains the list of {{FieldSchema}} objects which Todd saw in the heap dump. The above says that, yes, Hive {{Partition}} objects do hold a list of {{FieldSchema}}, but not via the simplest path, via the Hive API {{Table}} object. Perhaps we cache {{Partition}} objects in the table schema: Impala loads tables in the background by calling {{HdfsTable.load()}}: * The {{LocalTable}} wraps a number of subclass, of which the one of interest is {{HdfsTable}}. * {{load()}} calls {{loadAllPartitions()}} to do the partition work. * {{loadAllPartitions}} calls {{MetaStoreUtil.fetchAllPartitions()}} to get the partitions as a list of Hive {{Partition} objects. * {{loadAllParitions}} wraps each in a {{HdfsPartition}}, and calls {{addPartition}} to put the partition into a couple of maps. * But, {{HdfsPartition}} goes to extremes to copy data out of Hive’s {{Partition}} object without holding onto Hive’s object. So, we did take steps to avoid holding onto Hive’s {{Partition}} objects. Still, there are references, so the question is: where? > Slim down metastore Partition objects in LocalCatalog cache > ----------------------------------------------------------- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task > Reporter: Todd Lipcon > Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org