[ 
https://issues.apache.org/jira/browse/HIVE-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266810#comment-14266810
 ] 

Sergey Shelukhin commented on HIVE-9270:
----------------------------------------

I have partial patch, but it's nowhere close to even building, so I will not 
post it here for now :)

> LLAP: improve high-level cache from prototype
> ---------------------------------------------
>
>                 Key: HIVE-9270
>                 URL: https://issues.apache.org/jira/browse/HIVE-9270
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>
> Cache in the prototype has number of limitations.
> 1) Having 16-32-..Mb chunks with many logical units of caching can result in 
> undesirable priority phenomena. Priority tracking is needed for every such 
> unit, with some form of priority-splitting "compaction".
> I have a design for that that never blocks readers...
> 2) Something like buddy allocator can also be used instead of fixed size 
> blocks.
> 3) Needs tighter integration with file formats since we abandoned 
> intermediate format and are planning to make unit of caching much smaller 
> (RG, not stripe) - e.g. ORC can decompress data directly into a large buffer, 
> then pass on logical boundaries to ChunkPool.
> 4) For the same reason of having so many cached objects one might consider 
> actually making it format-specific and/or hierarchical, since requestion 
> 1000s of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just 
> a few columns that's a lot of objects to request - much easier if RGs are all 
> sequential and can be returned together if sargs didn't do a lot of 
> filtering).
> 5) Minor like not reusing allocated buffers after they are evicted and 
> instead allocating again, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to