[
https://issues.apache.org/jira/browse/HIVE-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266810#comment-14266810
]
Sergey Shelukhin commented on HIVE-9270:
----------------------------------------
I have partial patch, but it's nowhere close to even building, so I will not
post it here for now :)
> LLAP: improve high-level cache from prototype
> ---------------------------------------------
>
> Key: HIVE-9270
> URL: https://issues.apache.org/jira/browse/HIVE-9270
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
>
> Cache in the prototype has number of limitations.
> 1) Having 16-32-..Mb chunks with many logical units of caching can result in
> undesirable priority phenomena. Priority tracking is needed for every such
> unit, with some form of priority-splitting "compaction".
> I have a design for that that never blocks readers...
> 2) Something like buddy allocator can also be used instead of fixed size
> blocks.
> 3) Needs tighter integration with file formats since we abandoned
> intermediate format and are planning to make unit of caching much smaller
> (RG, not stripe) - e.g. ORC can decompress data directly into a large buffer,
> then pass on logical boundaries to ChunkPool.
> 4) For the same reason of having so many cached objects one might consider
> actually making it format-specific and/or hierarchical, since requestion
> 1000s of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just
> a few columns that's a lot of objects to request - much easier if RGs are all
> sequential and can be returned together if sargs didn't do a lot of
> filtering).
> 5) Minor like not reusing allocated buffers after they are evicted and
> instead allocating again, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)