[
https://issues.apache.org/jira/browse/HIVE-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946342#comment-15946342
]
Siddharth Seth commented on HIVE-16318:
---------------------------------------
Mostly looks good. +1.
Strongly prefer it if the fraction of on heap cache was not affecting per
executor memory.
LLAP_DAEMON_XMX_HEADROOM gives control over this value. Think overall
configuration and sizing will be a little easier without having to think about
1 more parameter.
e.g. ContainerSize=132G. CacheSize=20G. Heap=100G. memPerExecutor=4G.
numExecutors=25.
If the on-heap cache needs to be factored in (10%): change Xmx to a higher
value - 110.
Executor size remains unchanged. ContainerSize and Heap increase by the amount
that the on heap cache will use.
OTOH, with an automatic reduction.
Setting the heap to 110G, and a factor of 10%, 11G will be used for the cache.
The executor size calculations become a little more complicated (4G executors
still required).
Some of the calculations for executors - io.sort.mb, noconditionaltasksize,
unordered buffers are absolute values - based on the initially specified
container size (or per executor memory). Would be better to keep those as is.
(There's explicit validations for some of these values to be within executor
memory, which could cause errors)
Not automatically doing this means we're not accounting for cache usage, and
another parameter would need to be changed. There is a knob available though.
Setting it automatically reduces memory, but doesn't really fix the other
parameters which have already been computed based on the available memory.
10G vs 11G when the value is set to 10% will not make that much of a
difference. Without increasing the size though, executor memory can be reduced
by quite a bit.
TL;DR - Simpler configuration via containerSize, Xmx OR heap reservation,
instead of automatically subtracting from heap. Also not sure how accurate this
measure is - considering Java objects.
> LLAP cache: address some issues in 2.2/2.3
> ------------------------------------------
>
> Key: HIVE-16318
> URL: https://issues.apache.org/jira/browse/HIVE-16318
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HIVE-16318.01.patch, HIVE-16318.02.patch,
> HIVE-16318.03.patch, HIVE-16318.patch
>
>
> We've run into HIVE-16233 and HIVE-15665 and given that 2.2 and 2.3 releases
> are approaching we are going to add workarounds for them, and then commit the
> above patches and revert the workarounds as soon as we can.
> Unfortunately this will result in cache wasting some memory on some datasets,
> but the alternatives, when they are encountered (usually only on large
> datasets), are worse.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)