[ https://issues.apache.org/jira/browse/HIVE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294004#comment-14294004 ]
Sergey Shelukhin commented on HIVE-9418: ---------------------------------------- [~prasanth_j] - can you review https://github.com/apache/hive/commit/c7242290923fdfdcadf4408a7ba3970fefac8d7c ? This is the first part of this JIRA, where "old" ORC path can hypothetically use cache. Esp. the ORC changes. The idea is that DiskRange has 2 subclasses now, CacheChunk and old BufferChunk; we create LinkedList of DiskRange-s to read; then pass them thru cache and disk reader that replaces parts of ranges with CC-s and BC-s that actually have the data, so in the end the list is DR-s that all have some sort of data. Cache has been changed accordingly. I wonder if we even need metadata cache then. Whenever ORC goes for DR-s for metadata (see where I pass null as cache for footer, index etc.) we could instead also use cache, and just add some feature to have these blocks at much higher priority so they are less likely to be evicted. That way you'll parse them every time though, so for now Java-side cache might be better. > LLAP: ORC production of encoded data, cache usage > ------------------------------------------------- > > Key: HIVE-9418 > URL: https://issues.apache.org/jira/browse/HIVE-9418 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > ORC needs to be able to read self-contained rowgroups and return them. It > should use low-level cache in process. In future, we may use high-level cache > to cache rowgroups instead -- This message was sent by Atlassian JIRA (v6.3.4#6332)