[jira] [Commented] (DRILL-6180) Use System Option "output_batch_size" for External Sort

Paul Rogers (JIRA) Thu, 22 Feb 2018 13:50:22 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373523#comment-16373523
 ]


Paul Rogers commented on DRILL-6180:
------------------------------------

First, to [~kkhatua]'s point: it does not make sense to have a per-operator 
output batch size. Why? In Drill, the output of some operator x can be the 
input to any other operator y. Consider y. How can it plan its memory if it 
does not know the input batch size? This is exactly the problem we have today 
that we wish to solve. Since operators don't know the input batch size, they 
have to go through hoops to figure it out.

So, we should have a single target batch size as [~ppenumarthy] proposes.

It is probably fine to simply deprecate the sort-specific parameter in favor of 
the new one. Since the parameter was introduced only in 1.12, and no one has 
asked about how to use it, it is unlikely that anyone has found a need to 
change the value. So, no harm in removing  the old one.

Next, batch size. This is a surprisingly difficult question that requires 
consideration of several issues:
 * The minimum memory per node coming out of the {{MemoryAllocationUtilities}} 
and associated boot parameters. For sort, the minimum memory should be about 3x 
the input batch size. Current minimum memory is (I believe) 40 MB, so an input 
batch size (set as the output size of the upstream operator) of 32 MB will 
cause queries to fail. A similar argument applies to other operators.
 * Larger batch sizes are believed to be good because of the belief that larger 
vectors are more CPU-cache friendly. But, this is highly doubtful. First, CPU 
caches are on the order of 256K. Second, Drill creates a large number of 
threads so that any one thread runs for a very short time (and gets minimum 
benefit from caching.) Third, Drill operates row-by-row, so all vectors in a 
batch cycle through memory. So, there is no CPU cache advantage to large 
batches.
 * Drill has per-batch overhead. (Anecdotal evidence suggests this overhead is 
large.) So, we prefer some minimum row count to amortize the cost. (Even 100 
rows/batch reduces the overhead-per-row tremendously.) 10's of K of rows is not 
really adding extra value.
 * Drill wants to run large numbers of concurrent queries. The larger the 
per-batch memory, the more memory required. On a 30-core machine, 100 
concurrent queries will produce 30 * .7 * 100 = 2100 threads concurrent threads 
of 32 MB per batch results in needing 64 GB of memory just to hold in-flight 
batches, aside from additional memory needed for sorting, joining, etc.

See [this 
page|https://github.com/paul-rogers/drill/wiki/BH-Conceptual-Overview] for more 
details of the interrelationships between memory numbers.

Drill 1.12 introduced a throttling feature. I had to guess a per fragment 
overhead for the in-flight batch. Now that we fix that number, we should update 
those calcs to use the number.

If we fix the batch size, we should alter the {{MemoryAllocationUtilities}} to 
consider that number when setting the minimum operator memory size. (Maybe 
deprecate the config aded in 1.12 to set this memory size and instead make it, 
say, 3x (or 5x or whatever) the output batch size.

Note that, once all operators honor the size, it is no longer an "output" batch 
size, it is just a "batch size" (for both input and output batches). (Except 
for special-purpose batches, such as those created in sort spill files.)

My suggestion: make the batch size as small as possible. Can it be 8 MB? 4 MB? 
If we believe that records are small (on the order of, say, 200 bytes), then 4 
MB is still 20K records, which seems plenty.

> Use System Option "output_batch_size" for External Sort
> -------------------------------------------------------
>
>                 Key: DRILL-6180
>                 URL: https://issues.apache.org/jira/browse/DRILL-6180
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.12.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.13.0
>
>
> External Sort has boot time configuration for output batch size 
> "drill.exec.sort.external.spill.merge_batch_size" which is defaulted to 16M.
> To make batch sizing configuration uniform across all operators, change this 
> to use new system option that is added 
> "drill.exec.memory.operator.output_batch_size". This option has default value 
> of 32M.
> So, what are the implications if default is changed to 32M for external sort ?
> Instead, should we change the output batch size default to 16M for all 
> operators ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6180) Use System Option "output_batch_size" for External Sort

Reply via email to