[ https://issues.apache.org/jira/browse/HIVE-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943128#comment-16943128 ]
Ádám Szita commented on HIVE-22284: ----------------------------------- [^HIVE-22284.0.patch] is WIP (test and final touches will come later). The approach is as: * Make use of Map<Path, PartitionDesc> parts available during record reader creation ** this way information of Hive constructs (DB name, table name, partitions and properties) will be available on the LLAP side of the world * Replace String cacheTag with proper object that holds the aforementioned information and therefore also replace the logic that took Path as information source. * Fix CacheContentsTracker's parent tag generation ** aggregation should happen on web UI request time, not when the actual caching happens *** this will offload some work from LLAP IO thread pool *** and will produce the correct result (as opposed to the buggy way now mentioned in this Jira description) ** I'm not only generating 1 parent, but all parents up to DB level [~odraese], [~pvary], [~bslim] can you please share your thoughts? > Improve LLAP CacheContentsTracker to collect and display correct statistics > --------------------------------------------------------------------------- > > Key: HIVE-22284 > URL: https://issues.apache.org/jira/browse/HIVE-22284 > Project: Hive > Issue Type: Improvement > Components: llap > Reporter: Ádám Szita > Assignee: Ádám Szita > Priority: Major > Attachments: HIVE-22284.0.patch > > > When keeping track of which buffers correspond to what Hive objects, > CacheContentsTracker relies on cache tags. > Currently a tag is a simple String that ideally holds DB and table name, and > a partition spec concatenated by . and / . The information here is derived > from the Path of the file that is getting cached. Needless to say sometimes > this produces a wrong tag especially for external tables. > Also there's a bug when calculating aggregated stats for a 'parent' tag > (corresponding to the table of the partition) because the overall maxCount > and maxSize do not add up to the sum of those in the partitions. This happens > when buffers get removed from the cache. > -- This message was sent by Atlassian Jira (v8.3.4#803005)