[
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142160#comment-16142160
]
Paul Rogers commented on DRILL-5741:
------------------------------------
Not sure this entirely makes sense. It is again asking the user to add a new
variable to check the user's other settings.
In general, Drill should not use the entire memory on a node.
When running under YARN, YARN will assign memory. When running under other
managers (MapR Warden, Mesos, etc.) then those systems take care of the total
memory allocations across tasks.
Perhaps we could, on Drillbit start, sum the memory allocations and check
against total OS memory. But, how much should we reserve for the OS? For file
system caching? For ZK? For other apps? Pretty soon we are trying to do
node-level resource management "blind" inside the Drillbit.
In Drill-on-YARN, we considered the percentage-based allocation suggested
above. But, this is not as simple as it seems. Certain memory units are fixed
(such as code cache), some can be adjusted. But should the ratio between heap
and direct be the same at small levels (2 GB and 4 GB, say) vs at large levels
(50 GB and 100 GB?).
Instead, we worked the other way. We summed the memory allocation for code
cache, heap and direct to get the total memory requested from YARN.
I think we can consider memory oversubscription as a user error; a bit like
configuring storage plugins wrong, or running too many processes for a node, or
configuring the OS wrong, etc.
> Drillbit during startup should not exceed the available memory on a node
> ------------------------------------------------------------------------
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
> Issue Type: Improvement
> Components: Server
> Affects Versions: 1.11.0
> Reporter: Kunal Khatua
> Fix For: 1.12.0
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Currently, during startup, a Drillbit can be assigned large values for the
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a
> Drillbit is under heavy load. It would be good to have the Drillbit ensure
> during startup itself that the cumulative value of these parameters does not
> exceed a pre-defined upper limit for the Drill process.
> The proposal is to have the
> [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit]
> script look for an additional environment variable:
> {{DRILLBIT_MAX_PROC_MEM}}
> The parameter can specify the maximum in GB/MB (similar in syntax to how the
> Java's MaxHeap is defined), or in terms of percentage of available memory
> (not to exceed 95%).
> The
> [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit]
> script will perform the calculation of the sum of memory required by the
> memory spaces (heap, direct, etc) and ensure that it is within the limit
> defined by the {{DRILLBIT_MAX_PROC_MEM}} env variable.
> In the absence of this parameter, there will be no restriction. A node admin
> can then define this variable in the default terminal's environment (e.g.
> {{/root/.bashrc}} ) files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)