[ 
https://issues.apache.org/jira/browse/HIVE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9269:
-----------------------------------
    Fix Version/s: llap

> LLAP: introduce low-level cache for ORC
> ---------------------------------------
>
>                 Key: HIVE-9269
>                 URL: https://issues.apache.org/jira/browse/HIVE-9269
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>             Fix For: llap
>
>
> There are two distinct options for caching encoded data in row-columnar 
> format - caching logical chunks (e.g. for ORC stripe x column, or rg x 
> column), or caching physical chunks (e.g. for ORC, compression buffers, 
> entire stripes, ...). For highly selective queries, the former will probably 
> result in better cache utilization and less undesirable priority phenomena. 
> It will also be easier to use for different formats.
> However, given that logical chunks are variable-sized, it's harder to 
> implement. Prototype has a form of cache like that, but it has some serious 
> shortcomings in its current form. Additionally, high-level cache will operate 
> above ACID logic in file format and would thus require cache invalidation, 
> which is as we know one of the only hard things in CS.
> Low level cache for ORC case, however, is easier to implement due to nearly 
> fixed uncompressed size of compression buffers; these, at 256k default, are 
> also sufficiently granular. While not having the benefit of having ACID 
> delta-s already merged like a high-level cache would have, it will work with 
> ACID out of the box. 
> This JIRA is to implement low level cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to