[
https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149708#comment-16149708
]
Paul Rogers commented on DRILL-5758:
------------------------------------
The external sort memory manager works by anticipating the allocation size of
each batch: input, spill, merge, and so on. This is done using the "record
batch sizer" that figures out data sizes by observing input vectors. The sizer
then generates a set of allocation "hints" used to allocate proper-size vectors
for the various batches. If the memory calcs are wrong, then a batch might
become larger than expected, causing OOM errors. One way to check if batches
are under-estimated is to check if the sort code ends up needing to double
batch sizes. This does, in fact, occur:
{code}
Spilling 42 batches, into spill batches of 11397 rows, to /tmp/drill/spill/...
Initial output batch allocation: 2392064 bytes
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of
bytes: [91176] -> [182352]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of
bytes: [45592] -> [91184]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of
bytes: [45592] -> [91184]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes:
[11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes:
[11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes:
[11397] -> [22794]
vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes:
[91176] -> [182352]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes:
[11397] -> [22794]
vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes:
[91176] -> [182352]
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of
bytes: [182352] -> [364704]
Took 130655 us to merge 11397 records, consuming 3244032 bytes of memory
{code}
The above tells us that the estimates are off by no more than 50% (else the
vectors would be double more than once.) But, the estimates are off and must be
corrected.
* Original estimate for the batch size (from elsewhere in the logs): 1,572,786
* Actual initial allocation size: 2,392,064
* Final actual allocation size: 3,244,032
This tells us that the calculations are wrong somewhere.
> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
> Key: DRILL-5758
> URL: https://issues.apache.org/jira/browse/DRILL-5758
> Project: Apache Drill
> Issue Type: Task
> Affects Versions: 1.12.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA
> entries. Bugs fixed in this task are given by the linked issues.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)