Sergey Shelukhin created HIVE-9269:
--------------------------------------

             Summary: LLAP: introduce low-level cache for ORC
                 Key: HIVE-9269
                 URL: https://issues.apache.org/jira/browse/HIVE-9269
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin
            Assignee: Sergey Shelukhin


There are two distinct options for caching encoded data in row-columnar format 
- caching logical chunks (e.g. for ORC stripe x column, or rg x column), or 
caching physical chunks (e.g. for ORC, compression buffers, entire stripes, 
...). For highly selective queries, the former will probably result in better 
cache utilization and less undesirable priority phenomena. It will also be 
easier to use for different formats.
However, given that logical chunks are variable-sized, it's harder to 
implement. Prototype has a form of cache like that, but it has some serious 
shortcomings in its current form. Additionally, high-level cache will operate 
above ACID logic in file format and would thus require cache invalidation, 
which is as we know one of the only hard things in CS.
Low level cache for ORC case, however, is easier to implement due to nearly 
fixed uncompressed size of compression buffers; these, at 256k default, are 
also sufficiently granular. While not having the benefit of having ACID delta-s 
already merged like a high-level cache would have, it will work with ACID out 
of the box. 

This JIRA is to implement low level cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to