[ 
https://issues.apache.org/jira/browse/HIVE-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294004#comment-14294004
 ] 

Sergey Shelukhin commented on HIVE-9418:
----------------------------------------

[~prasanth_j] - can you review 
https://github.com/apache/hive/commit/c7242290923fdfdcadf4408a7ba3970fefac8d7c 
? This is the first part of this JIRA, where "old" ORC path can hypothetically 
use cache.

Esp. the ORC changes.
The idea is that DiskRange has 2 subclasses now, CacheChunk and old 
BufferChunk; we create LinkedList of DiskRange-s to read; then pass them thru 
cache and disk reader that replaces parts of ranges with CC-s and BC-s that 
actually have the data, so in the end the list is DR-s that all have some sort 
of data.
Cache has been changed accordingly.

I wonder if we even need metadata cache then. Whenever ORC goes for DR-s for 
metadata (see where I pass null as cache for footer, index etc.) we could 
instead also use cache, and just add some feature to have these blocks at much 
higher priority so they are less likely to be evicted. That way you'll parse 
them every time though, so for now Java-side cache might be better.

> LLAP: ORC production of encoded data, cache usage
> -------------------------------------------------
>
>                 Key: HIVE-9418
>                 URL: https://issues.apache.org/jira/browse/HIVE-9418
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> ORC needs to be able to read self-contained rowgroups and return them. It 
> should use low-level cache in process. In future, we may use high-level cache 
> to cache rowgroups instead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to