[ 
https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149708#comment-16149708
 ] 

Paul Rogers commented on DRILL-5758:
------------------------------------

The external sort memory manager works by anticipating the allocation size of 
each batch: input, spill, merge, and so on. This is done using the "record 
batch sizer" that figures out data sizes by observing input vectors. The sizer 
then generates a set of allocation "hints" used to allocate proper-size vectors 
for the various batches. If the memory calcs are wrong, then a batch might 
become larger than expected, causing OOM errors. One way to check if batches 
are under-estimated is to check if the sort code ends up needing to double 
batch sizes. This does, in fact, occur:

{code}
Spilling 42 batches, into spill batches of 11397 rows, to /tmp/drill/spill/...
Initial output batch allocation: 2392064 bytes
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of 
bytes: [91176] -> [182352]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of 
bytes: [45592] -> [91184]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of 
bytes: [45592] -> [91184]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: 
[11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: 
[11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: 
[11397] -> [22794]
vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: 
[91176] -> [182352]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: 
[11397] -> [22794]
vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: 
[91176] -> [182352]
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of 
bytes: [182352] -> [364704]
Took 130655 us to merge 11397 records, consuming 3244032 bytes of memory
{code}

The above tells us that the estimates are off by no more than 50% (else the 
vectors would be double more than once.) But, the estimates are off and must be 
corrected.

* Original estimate for the batch size (from elsewhere in the logs): 1,572,786
* Actual initial allocation size: 2,392,064
* Final actual allocation size: 3,244,032

This tells us that the calculations are wrong somewhere.

> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
>                 Key: DRILL-5758
>                 URL: https://issues.apache.org/jira/browse/DRILL-5758
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA 
> entries. Bugs fixed in this task are given by the linked issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to