GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/960

    DRILL-5815: Option to set query memory as percent of total

    This PR provides an alternative way to set the memory per query as a 
percent of system memory.
    
    ### Background
    
    Drill is an in-memory query engine, optimized for speed. Historically, all 
operators used as much memory as needed to perform their work. The sort has 
supported spilling for a number of releases. In this release, hash agg now also 
supports spilling.
    
    Once an operator can spill, we can define a memory “budget” for that 
operator. This has been done by setting a series of options, and making a 
number of assumptions:
    
    * Define `planner.memory.max_query_memory_per_node` as the amount of 
memory, per node, to give to each query. The default (which most users never 
change) is 2 GB.
    * Compute the number of buffering operators as:
      * The number of buffering operators (sort and hash agg) across all major 
fragments (as shown in the visualized plan),
      * Multiplied by the slice target, typically 70% of the number of CPUs on 
the node.
    * Divide max query memory by the total number of buffering operators to get 
the memory per operator.
    
    The problem is, with the default value of 2 GB, it is very easy to have 
sufficient cores, or sufficient buffering operators, that each operator gets a 
very thin slice of memory (10 or 20 MB).
    
    To work around this, we introduced a new option, 
`planner.memory.min_memory_per_buffered_op`
    which sets a floor on the per operator memory. The default is 40 MB. Thus, 
even if the above calculations would prefer to give an operator, say, 10 MB of 
memory, the floor will force the allocation to 40 MB. The result is that actual 
query use will far exceed the expected budget (by 4X in this example), but the 
query will run (assuming the necessary memory is, in fact, available.)
    
    This work-around is mostly fine because Drill still has a large number of 
operators that use unlimited memory, so a bit extra by the limited operators 
will be lost in the noise.
    
    The problem, now, is that if a machine is generous, and gives Drill 128 GB 
of memory, say, each query still gets only 2 GB, slices the per-operator memory 
too small, and either runs out of memory or runs slowly.
    
    ### Query Memory as a Percent of Total Memory
    
    This PR adds another option, `planner.memory.percent_per_query`, which 
provides another way to allocate query memory.
    
    With this option, Drill computes the memory per query per node as:
    
    * `planner.memory.percent_per_query` * the total direct memory, or
    * `planner.memory.max_query_memory_per_node`
    
    whichever is greater. For small systems, 
`planner.memory.max_query_memory_per_node` dominates. For larger systems, 
`planner.memory.percent_per_query` dominates.
    
    ### Computation
    
    To compute the proper number for the user’s workload:
    
    * Determine the memory ratio needed for non-managed operators. (See above.) 
Call this *_u_* (for unmanaged.)
    * Determine the target concurrency. Call this *_n_*.
    * Set `planner.memory.percent_per_query` to:
    
    ```
    planner.memory.percent_per_query = (1 - u) / n
    ```
    
    ### Default Value
    
    Let’s use the computation rules to determine how we arrive at the default 
setting of 0.05 (5%).
    
    * We allow half of the total memory for unmanaged operators. (*_u_* = 0.5)
    * We assume a concurrency of 10. (*_n_* = 10)
    * The default value is:
    
    ```
    (1 - 0.5) / 10 = 0.05 = 5%
    ```
    
    Why 50% for unmanaged? We have no solid metrics; but most queries do 
include hash joins and exchanges, so it seems prudent to give half memory those 
these unlimited operators. (Users may find they need an even larger allowance 
since the operators are, after all, unlimited in their memory usage.)
    
    Why concurrency of 10? The out-of-the-box configuration of 8 GB direct, 2 
GB per query allows a concurrency of 2-3. The Drill web site talks about 
concurrency in the 100s (on a very large cluster.) The (logarithmic) average of 
10<sup>0</sup> and 10<sup>2</sup> is 10<sup>1</sup> or 10.
    
    This is just a default; we expect users to tune the number for their site.
    
    ### Queue-Based Memory Assignment
    
    Another PR introduces the idea of using Drill’s ZK-based queues to 
allocate memory. That mechanism works similarly, except rather than having to 
assume a concurrency number, the queueing mechanism enforces that number.
    
    Therefore, the new option has no effect when throttling is enabled.
    
    ### Disabling the Feature
    
    Perhaps some users prefer to use only the static memory allocation as in 
prior versions of Drill. Setting `planner.memory.percent_per_query` to 0 
effectively disables this technique as it will always produce values lower than 
`max_query_memory_per_node`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-5815

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #960
    
----
commit 8dbee604233b86ba18fee2b72b8a2a872e1d49aa
Author: Paul Rogers <[email protected]>
Date:   2017-09-26T00:04:41Z

    DRILL-5815: Option to set query memory as percent of total

commit e08356c6e5df1fcb511e64c6aea8bc2ba8047c9a
Author: Paul Rogers <[email protected]>
Date:   2017-09-26T00:40:44Z

    Added option definition

commit 9127824afee0e7cf701641147bdbcb842ae6bf07
Author: Paul Rogers <[email protected]>
Date:   2017-09-26T01:16:21Z

    Unit test fix

----


---

Reply via email to