[ 
https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15833200#comment-15833200
 ] 

Paul Rogers commented on DRILL-5211:
------------------------------------

Actually, the problem appears to be related to the cache of allocated memory 
chunks.

{code}
Chunk(s) at 0~25%:
none
Chunk(s) at 0~50%:
Chunk(12122cd0: 1%, 40960/16777216)
Chunk(432d1a8f: 12%, 1998848/16777216)
Chunk(6bc20246: 0%, 0/16777216)
Chunk(5b40b4e5: 0%, 0/16777216)
Chunk(58b777f1: 0%, 0/16777216)
Chunk(73e5e70a: 0%, 0/16777216)
Chunk(84e1b02: 0%, 0/16777216)
Chunk(56777172: 0%, 0/16777216)
Chunk(359c8cb2: 0%, 0/16777216)
Chunk(699df0bc: 0%, 0/16777216)
Chunk(11f36086: 0%, 0/16777216)
Chunk(7ce26f2b: 0%, 0/16777216)
Chunk(2d4a8519: 0%, 0/16777216)
Chunk(2bd4881c: 0%, 0/16777216)
Chunk(21293ab0: 2%, 237568/16777216)
Chunk(4edd8289: 1%, 8192/16777216)
Chunk(37c6b406: 17%, 2744320/16777216)
Chunk(385d5e8a: 1%, 32768/16777216)
Chunk(50490f8b: 0%, 0/16777216)
Chunk(72a206c1: 0%, 0/16777216)
Chunk(7046ea17: 0%, 0/16777216)
Chunk(22bd539b: 0%, 0/16777216)
Chunk(3a902510: 0%, 0/16777216)
Chunk(5866a88d: 0%, 0/16777216)
Chunk(1fb7f7c4: 0%, 0/16777216)
Chunk(57de5e22: 0%, 0/16777216)
Chunk(6c5d496c: 0%, 0/16777216)
Chunk(192a6aa: 0%, 0/16777216)
Chunk(213b688b: 0%, 0/16777216)
Chunk(4b10dc0: 0%, 0/16777216)
Chunk(2212213: 0%, 0/16777216)
Chunk(1692730b: 0%, 0/16777216)
Chunk(6c173e62: 0%, 0/16777216)
Chunk(60c4f12d: 0%, 0/16777216)
Chunk(s) at 25~75%:
Chunk(6bfe669c: 0%, 0/16777216)
Chunk(6e715ac3: 0%, 0/16777216)
Chunk(3bc09d41: 0%, 0/16777216)
Chunk(7c4a4e8d: 0%, 0/16777216)
Chunk(64981d1e: 0%, 0/16777216)
Chunk(dbe40c: 0%, 0/16777216)
Chunk(3fce5bc3: 0%, 0/16777216)
Chunk(s) at 50~100%:
none
Chunk(s) at 75~100%:
Chunk(115e4491: 0%, 0/16777216)
Chunk(350acb49: 0%, 0/16777216)
Chunk(6a2ea260: 0%, 0/16777216)
Chunk(2773fca5: 0%, 0/16777216)
Chunk(446a4e16: 0%, 0/16777216)
Chunk(27d99551: 0%, 0/16777216)
Chunk(38fb1e68: 0%, 0/16777216)
Chunk(d54b06: 0%, 0/16777216)
Chunk(16d9aff4: 0%, 0/16777216)
Chunk(7dc1c363: 0%, 0/16777216)
Chunk(1da99aed: 0%, 0/16777216)
Chunk(378e6f25: 0%, 0/16777216)
Chunk(6cf3d02f: 0%, 0/16777216)
Chunk(1f5adc09: 0%, 0/16777216)
Chunk(4e7553fd: 0%, 0/16777216)
Chunk(a46ea51: 0%, 0/16777216)
Chunk(78c6219e: 0%, 0/16777216)
Chunk(31b5001b: 0%, 0/16777216)
Chunk(55bb476b: 0%, 0/16777216)
Chunk(68123bef: 0%, 0/16777216)
Chunk(21913da2: 0%, 0/16777216)
Chunk(383d4453: 0%, 0/16777216)
Chunk(3732cc20: 0%, 0/16777216)
Chunk(4e86446a: 0%, 0/16777216)
Chunk(66d21c35: 0%, 0/16777216)
Chunk(349fd360: 0%, 0/16777216)
Chunk(156d4a1f: 0%, 0/16777216)
Chunk(69b4e9cc: 0%, 0/16777216)
Chunk(1f71737b: 0%, 0/16777216)
Chunk(55bfa726: 0%, 0/16777216)
Chunk(2a7d323c: 0%, 0/16777216)
Chunk(64c94436: 0%, 0/16777216)
Chunk(70b7097f: 0%, 0/16777216)
Chunk(581906d8: 0%, 0/16777216)
Chunk(1b362335: 0%, 0/16777216)
Chunk(35f03c91: 0%, 0/16777216)
Chunk(7d4437a1: 0%, 0/16777216)
Chunk(6d7bd117: 0%, 0/16777216)
Chunk(47fe7806: 0%, 0/16777216)
Chunk(735ec0dc: 0%, 0/16777216)
Chunk(2ffb0829: 0%, 0/16777216)
Chunk(1cbb97a8: 0%, 0/16777216)
Chunk(28b1f271: 0%, 0/16777216)
Chunk(2d6c9f9b: 0%, 0/16777216)
Chunk(5a21605f: 0%, 0/16777216)
Chunk(1a67aa64: 0%, 0/16777216)
Chunk(3d62e123: 0%, 0/16777216)
Chunk(74bb2153: 0%, 0/16777216)
Chunk(25498403: 0%, 0/16777216)
Chunk(2da3e44: 0%, 0/16777216)
Chunk(281bbcc5: 0%, 0/16777216)
Chunk(587b12c: 0%, 0/16777216)
Chunk(6c874403: 0%, 0/16777216)
Chunk(3ffc7fc9: 0%, 0/16777216)
Chunk(4af41167: 0%, 0/16777216)
Chunk(72c2d7c4: 0%, 0/16777216)
Chunk(243332c3: 0%, 0/16777216)
Chunk(78ed13bb: 0%, 0/16777216)
Chunk(12f84ae8: 0%, 0/16777216)
Chunk(7660c384: 0%, 0/16777216)
Chunk(4bf852a1: 0%, 0/16777216)
Chunk(5b98f0ae: 0%, 0/16777216)
Chunk(be74e3f: 0%, 0/16777216)
Chunk(7b6bd024: 0%, 0/16777216)
Chunk(720ff8b2: 0%, 0/16777216)
Chunk(6e0e7bdd: 0%, 0/16777216)
Chunk(5fa94695: 0%, 0/16777216)
Chunk(7ae647b4: 0%, 0/16777216)
Chunk(77a1ea32: 0%, 0/16777216)
Chunk(6aecb788: 0%, 0/16777216)
Chunk(7fe4c9ae: 0%, 0/16777216)
Chunk(3777ea01: 0%, 0/16777216)
Chunk(4f7f76a7: 0%, 0/16777216)
Chunk(4020d837: 0%, 0/16777216)
Chunk(1950c024: 0%, 0/16777216)
Chunk(117f16ed: 0%, 0/16777216)
Chunk(2501802b: 0%, 0/16777216)
Chunk(63a605dc: 0%, 0/16777216)
Chunk(7ce8b86c: 0%, 0/16777216)
Chunk(15490162: 0%, 0/16777216)
Chunk(3c60db38: 0%, 0/16777216)
Chunk(6fbbb18d: 0%, 0/16777216)
Chunk(56a94fce: 0%, 0/16777216)
Chunk(bb61668: 0%, 0/16777216)
Chunk(3135b53d: 0%, 0/16777216)
Chunk(3b05d4f: 0%, 0/16777216)
Chunk(1f7ba5c8: 0%, 0/16777216)
Chunk(24c5e519: 0%, 0/16777216)
Chunk(38c520e1: 0%, 0/16777216)
Chunk(399e4893: 0%, 0/16777216)
Chunk(7b89ef8d: 0%, 0/16777216)
Chunk(706f30c8: 0%, 0/16777216)
Chunk(613cc40c: 0%, 0/16777216)
Chunk(2aadc268: 0%, 0/16777216)
Chunk(1eecb537: 0%, 0/16777216)
Chunk(178c3f52: 0%, 0/16777216)
Chunk(1017850b: 0%, 0/16777216)
Chunk(54edabe3: 0%, 0/16777216)
Chunk(2f53f944: 0%, 0/16777216)
Chunk(59532553: 0%, 0/16777216)
Chunk(7540ccaf: 0%, 0/16777216)
Chunk(4c4bc357: 0%, 0/16777216)
Chunk(7c629a43: 0%, 0/16777216)
Chunk(3cdb5121: 0%, 0/16777216)
Chunk(4f8dd7a1: 0%, 0/16777216)
Chunk(5d4ee47c: 0%, 0/16777216)
Chunk(3596dd14: 0%, 0/16777216)
Chunk(53a2d0de: 0%, 0/16777216)
Chunk(s) at 100%:
none
tiny subpages:
1: (2052: 2/512, offset: 32768, length: 8192, elemSize: 16)
2: (2915: 1/256, offset: 7102464, length: 8192, elemSize: 32)
4: (2473: 2/128, offset: 3481600, length: 8192, elemSize: 64)
8: (2049: 2/64, offset: 8192, length: 8192, elemSize: 128)
16: (2053: 2/32, offset: 40960, length: 8192, elemSize: 256)
small subpages:
1: (2048: 1/8, offset: 0, length: 8192, elemSize: 1024)
3: (2096: 1/2, offset: 393216, length: 8192, elemSize: 4096)
{code}

> External sort fails to allocate merge memory when plenty is free
> ----------------------------------------------------------------
>
>                 Key: DRILL-5211
>                 URL: https://issues.apache.org/jira/browse/DRILL-5211
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.9.0
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to 
> merge the results. But, to do that, it must first do an intermediate merge. 
> For example, in this sort, there are 190 spill files, but only 19 can be 
> merged at a time. (Each merge file contains 128 MB batches, and only 19 can 
> fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point, 
> total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} 
> in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the 
> {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an 
> OOM.
> The problem is that, at this point, the external sort should not ask the 
> system for more memory. The allocator for the external sort is at just 
> 1,192,350,366 before the allocation request. Plenty of spare memory should be 
> available, released when the in-memory batches were spilled to disk prior to 
> merging. Indeed, earlier in the run, the sort had reached a peak memory usage 
> of 2,710,716,416 bytes. This memory should be available for reuse during 
> merging, and is plenty sufficient to fill the particular request in question.
> May be a coincidence, but in a different run, the OOM occurs once memory hits 
> 1,310,154,570. That memory, in hex is 0x4E175F4A, which, in a 32-bit int, is 
> negative. Might some bit of code be using an int when it should use a long?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to