GitHub user paul-rogers opened a pull request:
https://github.com/apache/drill/pull/960
DRILL-5815: Option to set query memory as percent of total
This PR provides an alternative way to set the memory per query as a
percent of system memory.
### Background
Drill is an in-memory query engine, optimized for speed. Historically, all
operators used as much memory as needed to perform their work. The sort has
supported spilling for a number of releases. In this release, hash agg now also
supports spilling.
Once an operator can spill, we can define a memory âbudgetâ for that
operator. This has been done by setting a series of options, and making a
number of assumptions:
* Define `planner.memory.max_query_memory_per_node` as the amount of
memory, per node, to give to each query. The default (which most users never
change) is 2 GB.
* Compute the number of buffering operators as:
* The number of buffering operators (sort and hash agg) across all major
fragments (as shown in the visualized plan),
* Multiplied by the slice target, typically 70% of the number of CPUs on
the node.
* Divide max query memory by the total number of buffering operators to get
the memory per operator.
The problem is, with the default value of 2 GB, it is very easy to have
sufficient cores, or sufficient buffering operators, that each operator gets a
very thin slice of memory (10 or 20 MB).
To work around this, we introduced a new option,
`planner.memory.min_memory_per_buffered_op`
which sets a floor on the per operator memory. The default is 40 MB. Thus,
even if the above calculations would prefer to give an operator, say, 10 MB of
memory, the floor will force the allocation to 40 MB. The result is that actual
query use will far exceed the expected budget (by 4X in this example), but the
query will run (assuming the necessary memory is, in fact, available.)
This work-around is mostly fine because Drill still has a large number of
operators that use unlimited memory, so a bit extra by the limited operators
will be lost in the noise.
The problem, now, is that if a machine is generous, and gives Drill 128 GB
of memory, say, each query still gets only 2 GB, slices the per-operator memory
too small, and either runs out of memory or runs slowly.
### Query Memory as a Percent of Total Memory
This PR adds another option, `planner.memory.percent_per_query`, which
provides another way to allocate query memory.
With this option, Drill computes the memory per query per node as:
* `planner.memory.percent_per_query` * the total direct memory, or
* `planner.memory.max_query_memory_per_node`
whichever is greater. For small systems,
`planner.memory.max_query_memory_per_node` dominates. For larger systems,
`planner.memory.percent_per_query` dominates.
### Computation
To compute the proper number for the userâs workload:
* Determine the memory ratio needed for non-managed operators. (See above.)
Call this *_u_* (for unmanaged.)
* Determine the target concurrency. Call this *_n_*.
* Set `planner.memory.percent_per_query` to:
```
planner.memory.percent_per_query = (1 - u) / n
```
### Default Value
Letâs use the computation rules to determine how we arrive at the default
setting of 0.05 (5%).
* We allow half of the total memory for unmanaged operators. (*_u_* = 0.5)
* We assume a concurrency of 10. (*_n_* = 10)
* The default value is:
```
(1 - 0.5) / 10 = 0.05 = 5%
```
Why 50% for unmanaged? We have no solid metrics; but most queries do
include hash joins and exchanges, so it seems prudent to give half memory those
these unlimited operators. (Users may find they need an even larger allowance
since the operators are, after all, unlimited in their memory usage.)
Why concurrency of 10? The out-of-the-box configuration of 8 GB direct, 2
GB per query allows a concurrency of 2-3. The Drill web site talks about
concurrency in the 100s (on a very large cluster.) The (logarithmic) average of
10<sup>0</sup> and 10<sup>2</sup> is 10<sup>1</sup> or 10.
This is just a default; we expect users to tune the number for their site.
### Queue-Based Memory Assignment
Another PR introduces the idea of using Drillâs ZK-based queues to
allocate memory. That mechanism works similarly, except rather than having to
assume a concurrency number, the queueing mechanism enforces that number.
Therefore, the new option has no effect when throttling is enabled.
### Disabling the Feature
Perhaps some users prefer to use only the static memory allocation as in
prior versions of Drill. Setting `planner.memory.percent_per_query` to 0
effectively disables this technique as it will always produce values lower than
`max_query_memory_per_node`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/paul-rogers/drill DRILL-5815
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/960.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #960
----
commit 8dbee604233b86ba18fee2b72b8a2a872e1d49aa
Author: Paul Rogers <[email protected]>
Date: 2017-09-26T00:04:41Z
DRILL-5815: Option to set query memory as percent of total
commit e08356c6e5df1fcb511e64c6aea8bc2ba8047c9a
Author: Paul Rogers <[email protected]>
Date: 2017-09-26T00:40:44Z
Added option definition
commit 9127824afee0e7cf701641147bdbcb842ae6bf07
Author: Paul Rogers <[email protected]>
Date: 2017-09-26T01:16:21Z
Unit test fix
----
---