[ 
https://issues.apache.org/jira/browse/HIVE-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943128#comment-16943128
 ] 

Ádám Szita commented on HIVE-22284:
-----------------------------------

[^HIVE-22284.0.patch] is WIP (test and final touches will come later). The 
approach is as:
 * Make use of Map<Path, PartitionDesc> parts available during record reader 
creation
 ** this way information of Hive constructs (DB name, table name, partitions 
and properties) will be available on the LLAP side of the world
 * Replace String cacheTag with proper object that holds the aforementioned 
information and therefore also replace the logic that took Path as information 
source.
 * Fix CacheContentsTracker's parent tag generation
 ** aggregation should happen on web UI request time, not when the actual 
caching happens
 *** this will offload some work from LLAP IO thread pool
 *** and will produce the correct result (as opposed to the buggy way now 
mentioned in this Jira description)
 ** I'm not only generating 1 parent, but all parents up to DB level

[~odraese], [~pvary], [~bslim] can you please share your thoughts?

> Improve LLAP CacheContentsTracker to collect and display correct statistics
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-22284
>                 URL: https://issues.apache.org/jira/browse/HIVE-22284
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>         Attachments: HIVE-22284.0.patch
>
>
> When keeping track of which buffers correspond to what Hive objects, 
> CacheContentsTracker relies on cache tags.
> Currently a tag is a simple String that ideally holds DB and table name, and 
> a partition spec concatenated by . and / . The information here is derived 
> from the Path of the file that is getting cached. Needless to say sometimes 
> this produces a wrong tag especially for external tables.
> Also there's a bug when calculating aggregated stats for a 'parent' tag 
> (corresponding to the table of the partition) because the overall maxCount 
> and maxSize do not add up to the sum of those in the partitions. This happens 
> when buffers get removed from the cache.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to