Sergey Shelukhin created HIVE-9270:
--------------------------------------

             Summary: LLAP: improve high-level cache from prototype
                 Key: HIVE-9270
                 URL: https://issues.apache.org/jira/browse/HIVE-9270
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergey Shelukhin


Cache in the prototype has number of limitations.
1) Having 16-32-..Mb chunks with many logical units of caching can result in 
undesirable priority phenomena. Priority tracking is needed for every such 
unit, with some form of priority-splitting "compaction".
I have a design for that that never blocks readers...
2) Something like buddy allocator can also be used instead of fixed size blocks.
3) Needs tighter integration with file formats since we abandoned intermediate 
format and are planning to make unit of caching much smaller (RG, not stripe) - 
e.g. ORC can decompress data directly into a large buffer, then pass on logical 
boundaries to ChunkPool.
4) For the same reason of having so many cached objects one might consider 
actually making it format-specific and/or hierarchical, since requestion 1000s 
of objects may be suboptimal (e.g. TPCDS stripe has ~430 RGs, with just a few 
columns that's a lot of objects to request - much easier if RGs are all 
sequential and can be returned together if sargs didn't do a lot of filtering).
5) Minor like not reusing allocated buffers after they are evicted and instead 
allocating again, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to