[ https://issues.apache.org/jira/browse/HIVE-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266810#comment-14266810 ]
Sergey Shelukhin commented on HIVE-9270: ---------------------------------------- I have partial patch, but it's nowhere close to even building, so I will not post it here for now :) > LLAP: improve high-level cache from prototype > --------------------------------------------- > > Key: HIVE-9270 > URL: https://issues.apache.org/jira/browse/HIVE-9270 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > > Cache in the prototype has number of limitations. > 1) Having 16-32-..Mb chunks with many logical units of caching can result in > undesirable priority phenomena. Priority tracking is needed for every such > unit, with some form of priority-splitting "compaction". > I have a design for that that never blocks readers... > 2) Something like buddy allocator can also be used instead of fixed size > blocks. > 3) Needs tighter integration with file formats since we abandoned > intermediate format and are planning to make unit of caching much smaller > (RG, not stripe) - e.g. ORC can decompress data directly into a large buffer, > then pass on logical boundaries to ChunkPool. > 4) For the same reason of having so many cached objects one might consider > actually making it format-specific and/or hierarchical, since requestion > 1000s of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just > a few columns that's a lot of objects to request - much easier if RGs are all > sequential and can be returned together if sargs didn't do a lot of > filtering). > 5) Minor like not reusing allocated buffers after they are evicted and > instead allocating again, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)