[ https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629621#comment-16629621 ]
Paul Rogers commented on IMPALA-7501: ------------------------------------- So the above was probably looking in the wrong haystack. Todd's comment is the key: {{LocalCatalog}}. The local catalog caches the HMS Thrift objects, including {{Partition}}. The chain is: * {{LocalDb}} contains a map of {LocalTable}}. * {{LocalTable}} has a subclass {{LocalFsTable}} which contains a map of {{LocalPartitionSpec}} objects. * {{LocalPartitionSpec}} has a relation (need to research) to {{LocalFsPartition}}. * {{LocalFsPartition}} holds onto the Hive {{Partition}}, which holds onto the {{FieldSchema}} objects. Short term, just need to track down how we cache the {{Partition}} and nuke the {{FieldSchema}}, then retest. Longer term, the note earlier does apply. While the query-specific metadata goes to pains to avoid caching HMS objects, LocalCatalog (and presumably the similar version in the {{catalogd}} do cache HMS objects which, as noted earlier, are rather bloated for our needs. > Slim down metastore Partition objects in LocalCatalog cache > ----------------------------------------------------------- > > Key: IMPALA-7501 > URL: https://issues.apache.org/jira/browse/IMPALA-7501 > Project: IMPALA > Issue Type: Sub-task > Reporter: Todd Lipcon > Priority: Minor > > I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit > after running a production workload simulation for a couple hours. It had > 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, > in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects > are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M > objects are retained by FieldSchema, which, as far as I remember, are ignored > on the partition level by the Impala planner. So, with a bit of slimming down > of these objects, we could make a huge dent in effective cache capacity given > a fixed budget. Reducing object count should also have the effect of improved > GC performance (old gen GC is more closely tied to object count than size) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org