[ 
https://issues.apache.org/jira/browse/HIVE-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946595#comment-16946595
 ] 

Ádám Szita commented on HIVE-22284:
-----------------------------------

Thanks for the comments [~gopalv], [~pvary]. I've refactored CacheTag into 3 
versions as recommended, and also interned table names and partition descs as 
well.

Naturally using such CacheTag objects poses bigger overhead than the current 
version that uses Strings, but in my opinion this isn't a substantial 
difference (+8 bytes per reference [assuming LLAP daemon > -Xmx32G ]+~12 bytes 
object overhead / tag ) especially if we're interning the reoccurring values.
And on the other hand: we get correct stats that match Hive constructs exactly.

> Improve LLAP CacheContentsTracker to collect and display correct statistics
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-22284
>                 URL: https://issues.apache.org/jira/browse/HIVE-22284
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>         Attachments: HIVE-22284.0.patch, HIVE-22284.1.patch, 
> HIVE-22284.2.patch, HIVE-22284.3.patch, HIVE-22284.4.patch
>
>
> When keeping track of which buffers correspond to what Hive objects, 
> CacheContentsTracker relies on cache tags.
> Currently a tag is a simple String that ideally holds DB and table name, and 
> a partition spec concatenated by . and / . The information here is derived 
> from the Path of the file that is getting cached. Needless to say sometimes 
> this produces a wrong tag especially for external tables.
> Also there's a bug when calculating aggregated stats for a 'parent' tag 
> (corresponding to the table of the partition) because the overall maxCount 
> and maxSize do not add up to the sum of those in the partitions. This happens 
> when buffers get removed from the cache.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to