[
https://issues.apache.org/jira/browse/DRILL-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147652#comment-16147652
]
ASF GitHub Bot commented on DRILL-5741:
---------------------------------------
Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/922
This may be one of those times when we need to resort to a bit of design
thinking.
The core idea is that the user sets one environment variable to check the
others. The first issue is that, if the user can't do the sums to set the Drill
memory allocation right (with respect to actual memory), not sure how they will
get the total memory variable right.
OK, so we get the memory from the system, then do a percentage. That is
better. But, what is the system memory? Is it total memory? Suppose the user
says Drill gets 60%. We can now check. But, Drill is distributed. Newer nodes
in a cluster may have 256GB, older nodes 128GB. Drill demands symmetrical
resources so the memory given to Drill must be identical on all nodes,
regardless of system memory. So, the percent of total system memory idea
doesn't work in practice.
So, maybe we express memory as the total *free* memory. Cool. We give Drill
60%. Drill starts and everything is fine. Now, we also give Spark 60%. Spark
starts. It complains in its logs (assuming we make this same change to the
Spark startup scripts.) But, Spark uses its memory and causes Drill to fail. We
check Drill logs. Nada. We have to check Spark's logs. Now, imagine doing this
with five apps; the app that complains may not be the one to fail. And, imagine
doing this across 100 nodes. Won't scale.
Note that the problem is that we checked memory statically at startup. But,
our problem was that things changed later: we launched an over-subscribed
Spark. So, our script must run continuously, constantly checking if any new
apps are launched. Since some apps grow memory over time, we have to check all
other apps for total memory usage against that allocated to Drill.
Now, presumably, all other apps are doing the same: Spark is continually
checking, Storm is doing so, and so on. Now, the admin needs to gather all
these logs (across dozens of nodes) and extract meaning. What we need, then, is
a network endpoint to publish the information and a tool to gather and report
that data. We've just invented monitoring tools.
Take a step back, what we really want to know is available system memory
vs. that consumed by apps. So, what we want is a Linux-level monitoring of free
memory. And, since we have other things to do, we want alerts when free memory
drops below some point. We've now invented alerting tools.
Now, we got into this mess because we launched apps without concern about
the total memory usage on each node. That is, we didn't plan our app load to
fit into our available memory. So, we turn this around. We've got 128GB (say)
of memory. How do we run only those apps that fit, deferring those that don't?
We've just invented YARN, Mesos, Kubernetes and the like.
Now we get to the reason for the -1. The proposed change adds significant
complexity to the scripts, *but can never solve the actual oversubscription
problem*. For that, we need a global resource manager.
Now, suppose that someone wants to run Drill without such a manager.
Perhaps some distribution does not provide this tool and instead provides a
tool that simply launches processes, leaving it to each process to struggle
with its own resources. In such an environment, the vendor can add a check,
such as this one, that will fire on all nodes and warn the user about potential
oversubscription *on that node*, *at that moment*, *for that app* in *one app's
log file*.
To facilitate this, we can do two things.
1. In the vendor-specific `distrib-env.sh` file, do any memory setting
adjustments that are wanted.
2. Modify `drillbit.sh` to call a `drill-check.sh` script, if it exists,
just prior to launching Drill.
3. In the vendor-specific `distrib-env.sh` file, do the check proposed here.
The only change needed in Apache Drill is step 2. Then each vendor can add
the checks if they don't provide a resource manager. Those vendors (or users)
that use YARN or Mesos or whatever don't need the checks because they have
overall tools that solves the problem for them.
Thanks!
> During startup Drill should not exceed the available memory
> -----------------------------------------------------------
>
> Key: DRILL-5741
> URL: https://issues.apache.org/jira/browse/DRILL-5741
> Project: Apache Drill
> Issue Type: Improvement
> Components: Server
> Affects Versions: 1.11.0
> Reporter: Kunal Khatua
> Assignee: Kunal Khatua
> Fix For: 1.12.0
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Currently, during startup, a Drillbit can be assigned large values for the
> following:
> * Xmx (Heap)
> * XX:MaxDirectMemorySize
> * XX:ReservedCodeCacheSize
> * XX:MaxPermSize
> All of this, potentially, can exceed the available memory on a system when a
> Drillbit is under heavy load. It would be good to have the Drillbit ensure
> during startup itself that the cumulative value of these parameters does not
> exceed a pre-defined upper limit for the Drill process.
> The proposal is to have the
> [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit]
> script look for an additional environment variable:
> {{DRILLBIT_MAX_PROC_MEM}}
> The parameter can specify the maximum in GB/MB (similar in syntax to how the
> Java's MaxHeap is defined), or in terms of percentage of available memory
> (not to exceed 95%).
> The
> [runbit|https://github.com/apache/drill/blob/master/distribution/src/resources/runbit]
> script will perform the calculation of the sum of memory required by the
> memory spaces (heap, direct, etc) and ensure that it is within the limit
> defined by the {{DRILLBIT_MAX_PROC_MEM}} env variable.
> In the absence of this parameter, there will be no restriction. A node admin
> can then define this variable in the default terminal's environment (e.g.
> {{/root/.bashrc}} ) files.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)