[ https://issues.apache.org/jira/browse/HIVE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-9269: ----------------------------------- Fix Version/s: llap > LLAP: introduce low-level cache for ORC > --------------------------------------- > > Key: HIVE-9269 > URL: https://issues.apache.org/jira/browse/HIVE-9269 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Fix For: llap > > > There are two distinct options for caching encoded data in row-columnar > format - caching logical chunks (e.g. for ORC stripe x column, or rg x > column), or caching physical chunks (e.g. for ORC, compression buffers, > entire stripes, ...). For highly selective queries, the former will probably > result in better cache utilization and less undesirable priority phenomena. > It will also be easier to use for different formats. > However, given that logical chunks are variable-sized, it's harder to > implement. Prototype has a form of cache like that, but it has some serious > shortcomings in its current form. Additionally, high-level cache will operate > above ACID logic in file format and would thus require cache invalidation, > which is as we know one of the only hard things in CS. > Low level cache for ORC case, however, is easier to implement due to nearly > fixed uncompressed size of compression buffers; these, at 256k default, are > also sufficiently granular. While not having the benefit of having ACID > delta-s already merged like a high-level cache would have, it will work with > ACID out of the box. > This JIRA is to implement low level cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)