[ https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149708#comment-16149708 ]
Paul Rogers commented on DRILL-5758: ------------------------------------ The external sort memory manager works by anticipating the allocation size of each batch: input, spill, merge, and so on. This is done using the "record batch sizer" that figures out data sizes by observing input vectors. The sizer then generates a set of allocation "hints" used to allocate proper-size vectors for the various batches. If the memory calcs are wrong, then a batch might become larger than expected, causing OOM errors. One way to check if batches are under-estimated is to check if the sort code ends up needing to double batch sizes. This does, in fact, occur: {code} Spilling 42 batches, into spill batches of 11397 rows, to /tmp/drill/spill/... Initial output batch allocation: 2392064 bytes vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [91176] -> [182352] vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184] vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184] vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794] vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794] vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794] vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [91176] -> [182352] vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794] vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [91176] -> [182352] vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [182352] -> [364704] Took 130655 us to merge 11397 records, consuming 3244032 bytes of memory {code} The above tells us that the estimates are off by no more than 50% (else the vectors would be double more than once.) But, the estimates are off and must be corrected. * Original estimate for the batch size (from elsewhere in the logs): 1,572,786 * Actual initial allocation size: 2,392,064 * Final actual allocation size: 3,244,032 This tells us that the calculations are wrong somewhere. > Rollup of external sort fixes to issues found by QA > --------------------------------------------------- > > Key: DRILL-5758 > URL: https://issues.apache.org/jira/browse/DRILL-5758 > Project: Apache Drill > Issue Type: Task > Affects Versions: 1.12.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Fix For: 1.12.0 > > > Tracking JIRA to used for the PR that combines fixes for various JIRA > entries. Bugs fixed in this task are given by the linked issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)