Sergey Shelukhin created HIVE-9270:
--------------------------------------
Summary: LLAP: improve high-level cache from prototype
Key: HIVE-9270
URL: https://issues.apache.org/jira/browse/HIVE-9270
Project: Hive
Issue Type: Sub-task
Reporter: Sergey Shelukhin
Cache in the prototype has number of limitations.
1) Having 16-32-..Mb chunks with many logical units of caching can result in
undesirable priority phenomena. Priority tracking is needed for every such
unit, with some form of priority-splitting "compaction".
I have a design for that that never blocks readers...
2) Something like buddy allocator can also be used instead of fixed size blocks.
3) Needs tighter integration with file formats since we abandoned intermediate
format and are planning to make unit of caching much smaller (RG, not stripe) -
e.g. ORC can decompress data directly into a large buffer, then pass on logical
boundaries to ChunkPool.
4) For the same reason of having so many cached objects one might consider
actually making it format-specific and/or hierarchical, since requestion 1000s
of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just a few
columns that's a lot of objects to request - much easier if RGs are all
sequential and can be returned together if sargs didn't do a lot of filtering).
5) Minor like not reusing allocated buffers after they are evicted and instead
allocating again, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)