[
https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-20380:
------------------------------------
Summary: LLAP cache should cache small buffers more efficiently (was:
explore storing multiple CBs in a single cache buffer in LLAP cache)
> LLAP cache should cache small buffers more efficiently
> ------------------------------------------------------
>
> Key: HIVE-20380
> URL: https://issues.apache.org/jira/browse/HIVE-20380
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum
> (instead of 256Kb), then after we moved metadata cache off-heap, the index
> streams that are all tiny take up a lot of CBs and waste space.
> Wasted space can require larger cache and lead to cache OOMs on some
> workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and
> probably compute) overhead to track all these buffers. Arguably even the 4Kb
> min.alloc is too small.
> We should store contiguous CBs in the same buffer; to start, we can do it for
> ROW_INDEX streams. That probably means reading all ROW_INDEX streams instead
> of doing projection when we see that they are too small.
> We need to investigate what the pattern is for ORC data blocks. One option is
> to increase min.alloc and then consolidate multiple 4-8Kb CBs, but only for
> the same stream. However larger min.alloc will result in wastage for really
> small streams, so we can also consolidate multiple streams (potentially
> across columns) if needed. This will result in some priority anomalies but
> they probably ok.
> Another consideration is making tracking less object oriented, in particular
> passing around integer indexes instead of objects and storing state in giant
> arrays somewhere (potentially with some optimizations for less common
> things), instead of every buffers getting its own object.
> cc [~gopalv] [~prasanth_j]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)