[
https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833947#comment-15833947
]
Paul Rogers commented on DRILL-5211:
------------------------------------
For this use case, input is 18 GB. Data arrives at the sort in batches of size
128 MB. The vectors that make up the batch are:
* Offsets, length of 32768
* RepeatedVarCharVector, length of 67,108,864
Despite this, the total size, when shifted into the sort, is reported as
134,340,608 MB.
This seems an incredible waste of space; a very large source of internal
fragmentation, perhaps due to the power-of-two allocation rule. (Though
67,108,864 is, itself, a power of two...)
That aside, we see that the incoming vector is far larger than the 16 MB used
in the free chunk list.
> External sort fails to allocate merge memory when plenty is free
> ----------------------------------------------------------------
>
> Key: DRILL-5211
> URL: https://issues.apache.org/jira/browse/DRILL-5211
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.9.0
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to
> merge the results. But, to do that, it must first do an intermediate merge.
> For example, in this sort, there are 190 spill files, but only 19 can be
> merged at a time. (Each merge file contains 128 MB batches, and only 19 can
> fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point,
> total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}}
> in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the
> {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an
> OOM.
> The problem is that, at this point, the external sort should not ask the
> system for more memory. The allocator for the external sort is at just
> 1,192,350,366 before the allocation request. Plenty of spare memory should be
> available, released when the in-memory batches were spilled to disk prior to
> merging. Indeed, earlier in the run, the sort had reached a peak memory usage
> of 2,710,716,416 bytes. This memory should be available for reuse during
> merging, and is plenty sufficient to fill the particular request in question.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)