[
https://issues.apache.org/jira/browse/HIVE-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571562#comment-14571562
]
Sergey Shelukhin commented on HIVE-10068:
-----------------------------------------
Update from some test runs on TPCDS and TPCH queries, we waste around 15%
allocated memory due to buddy allocator granularity:
{noformat}
$ sed -E "s/.*ALLOCATED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk '{s+=$1}END{print
s}'
278162046976
$ sed -E "s/.*ALLOCATED_USED_BYTES=([0-9]+).*/\1/" lrfu1.log | awk
'{s+=$1}END{print s}'
238565954908
{noformat}
Some of that is obviously unavoidable, but some could be avoided by
implementing this. However, it's not as bad as I expected (bad results can be
seen on very small datasets were stripes/RGs are routinely smaller than
compression block size.
> LLAP: adjust allocation after decompression
> -------------------------------------------
>
> Key: HIVE-10068
> URL: https://issues.apache.org/jira/browse/HIVE-10068
> Project: Hive
> Issue Type: Sub-task
> Reporter: Sergey Shelukhin
>
> We don't know decompressed size of a compression buffer in ORC, all we know
> is the file-level compression buffer size. For many files, compression
> buffers can be smaller than that because of compact encoding, or because
> compression block ends for other reasons (different streams, etc. - "present"
> streams for example are very small).
> BuddyAllocator should be able to accept back parts of the allocated memory
> (e.g. allocate 256Kb with minimum allocation of 32Kb, decompress 45Kb, return
> the last 192Kb as 64+128Kb). For generality (this depends on implementation),
> we can make an API like "offer", and allocator can decide to take back
> however much it can.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)