[ 
https://issues.apache.org/jira/browse/IMPALA-7501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629621#comment-16629621
 ] 

Paul Rogers edited comment on IMPALA-7501 at 10/13/18 1:37 AM:
---------------------------------------------------------------

The local cache holds onto HMS {{Partition}} objects via the 
{{PartitionMetadataImpl}} class within the {{CatalogdMetaProvider}} class.

Adding a single line to the constructor of that class should remove the 
unwanted column schemas:

{noformat}
msPartition_.getSd().unsetCols();
{noformat}

Rerunning the {{LocalCatalogTest}} cases showed no issues.


was (Author: paul.rogers):
The path to the HMS {{Partition}} objects appears to be:

* {{HdfsTable}} holds onto a set of {{FeFsPartition}} objects.
* In local catalog mode, the {{FeFsParition}} is an instance of 
{{LocalFsPartition}}.
* {{LocalFsPartition}} holds onto the HMS {{Partition}} objects.
* {{Partition}} holds onto a {{StorageDescriptor}} which holds onto a list of 
the {{FieldSchema}} objects that Todd noted.

However, there is no obvious path that causes code to hold onto the 
{{LocalFsParition}} objects; in the local catalog implementation, they are 
converted to Thrift format, then discarded. It is not clear how the 
{{FeFsPartition}} objects are recreated for a query. The available tests don’t 
exercise this path.

Perhaps code changed since this issues was reported?

No code in {{LocalFsParition}} accesses the columns stored in the 
{{StorageDescriptor}} associated with the {{Partition}}, so it is probably safe 
to nuke them. Added the following to the {{LocalFsPartition}} constructor:

{noformat}
msPartition_.getSd().unsetCols();
{noformat}

Rerunning the {{LocalCatalogTest}} cases showed no issues.

Need to talk to Todd to better understand the design, and to determine how to 
run a test case similar to that in the ticket description.

> Slim down metastore Partition objects in LocalCatalog cache
> -----------------------------------------------------------
>
>                 Key: IMPALA-7501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7501
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> I took a heap dump of an impalad running in LocalCatalog mode with a 2G limit 
> after running a production workload simulation for a couple hours. It had 
> 38.5M objects and 2.02GB heap (the vast majority of the heap is, as expected, 
> in the LocalCatalog cache). Of this total footprint, 1.78GB and 34.6M objects 
> are retained by 'Partition' objects. Drilling into those, 1.29GB and 33.6M 
> objects are retained by FieldSchema, which, as far as I remember, are ignored 
> on the partition level by the Impala planner. So, with a bit of slimming down 
> of these objects, we could make a huge dent in effective cache capacity given 
> a fixed budget. Reducing object count should also have the effect of improved 
> GC performance (old gen GC is more closely tied to object count than size)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to