[
https://issues.apache.org/jira/browse/HIVE-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944666#comment-16944666
]
Gopal Vijayaraghavan commented on HIVE-22284:
---------------------------------------------
Strictly from a memory use perspetive, the CacheTag is better served as an
abstract class with 3 impls - TableCacheTag, PartitionCacheTag and
DeepPartitionsCacheTag (for no partition, 1 partition and >1 partitions).
{code}
+ part.getPartSpec().entrySet().stream()
+ .map(e -> e.getKey() + "=" +
e.getValue()).collect(toCollection(LinkedList::new))
{code}
is where the other allocation is hidden, both the String concat and the new
LinkedList.
> Improve LLAP CacheContentsTracker to collect and display correct statistics
> ---------------------------------------------------------------------------
>
> Key: HIVE-22284
> URL: https://issues.apache.org/jira/browse/HIVE-22284
> Project: Hive
> Issue Type: Improvement
> Components: llap
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
> Attachments: HIVE-22284.0.patch, HIVE-22284.1.patch,
> HIVE-22284.2.patch
>
>
> When keeping track of which buffers correspond to what Hive objects,
> CacheContentsTracker relies on cache tags.
> Currently a tag is a simple String that ideally holds DB and table name, and
> a partition spec concatenated by . and / . The information here is derived
> from the Path of the file that is getting cached. Needless to say sometimes
> this produces a wrong tag especially for external tables.
> Also there's a bug when calculating aggregated stats for a 'parent' tag
> (corresponding to the table of the partition) because the overall maxCount
> and maxSize do not add up to the sum of those in the partitions. This happens
> when buffers get removed from the cache.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)