[ 
https://issues.apache.org/jira/browse/DRILL-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052619#comment-16052619
 ] 

Paul Rogers edited comment on DRILL-5522 at 6/17/17 5:14 AM:
-------------------------------------------------------------

Basics from log:

{code}
Config: memory limit = 30439780, spill file size = 268435456, 
spill batch size = 7609945, merge limit = 2147483647, 
merge batch size = 15219890
...
Actual batch schema & sizes {
  ...
  Records: 1023, Total size: 140696, Gross row width:139, Net row width:95, 
Density:100}
Memory delta: 264192, actual batch size: 143254, Diff: 120938
...
Unable to allocate buffer of size 2097152 due to memory limit. Current 
allocation: 29229064
...
org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82) 
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at ...
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
{code}

The error occurs in the merge phase of the query. After the allocation, used 
memory would be 31,326,216, but the limit is 30,439,780. So, OOM.

One problem is that the "batch sizer" is missing some of the data: it reports a 
batch size of 140,696, but the actual memory delta, is 264192. This could throw 
off the rest of the memory calculations and could be due to errors in handling 
maps.


was (Author: paul-rogers):
Basics from log:

{code}
Config: memory limit = 30439780, spill file size = 268435456, 
spill batch size = 7609945, merge limit = 2147483647, 
merge batch size = 15219890
...
Actual batch schema & sizes {
  ...
  Records: 1023, Total size: 140696, Gross row width:139, Net row width:95, 
Density:100}
Memory delta: 264192, actual batch size: 143254, Diff: 120938
...
Unable to allocate buffer of size 2097152 due to memory limit. Current 
allocation: 29229064
...
org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
 ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at 
org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82) 
~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at ...
org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
{code}

The error occurs in the merge phase of the query. After the allocation, used 
memory would be 31,326,216, but the limit is 30,439,780. So, OOM.

> OOM during the merge and spill process of the managed external sort
> -------------------------------------------------------------------
>
>                 Key: DRILL-5522
>                 URL: https://issues.apache.org/jira/browse/DRILL-5522
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: 26e334aa-1afa-753f-3afe-862f76b80c18.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> git.commit.id.abbrev=1e0a14c
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.memory.max_query_memory_per_node` = 1552428800;
> create table dfs.drillTestDir.xsort_ctas3_multiple partition by (type, aCol) 
> as select type, rptds, rms, s3.rms.a aCol, uid from (
>   select * from (
>     select s1.type type, flatten(s1.rms.rptd) rptds, s1.rms, s1.uid
>     from (
>       select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid
>     ) s1
>   ) s2
>   order by s2.rms.mapid, s2.rptds.a
> ) s3;
> {code}
> Stack trace
> {code}
> 2017-05-17 15:15:35,027 [26e334aa-1afa-753f-3afe-862f76b80c18:frag:4:2] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes 
> ran out of memory while executing the query. (Unable to allocate buffer of 
> size 2097152 due to memory limit. Current allocation: 29229064)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 29229064
> [Error Id: 619e2e34-704c-4964-a354-1348fb33ce8a ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 2097152 due to memory limit. Current allocation: 
> 29229064
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) 
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.BigIntVector.reAlloc(BigIntVector.java:212) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.BigIntVector.copyFromSafe(BigIntVector.java:324) 
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe(NullableBigIntVector.java:367)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe(NullableBigIntVector.java:328)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe(RepeatedMapVector.java:360)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe(MapVector.java:220)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.vector.complex.MapVector.copyFromSafe(MapVector.java:82)
>  ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.doCopy(PriorityQueueCopierTemplate.java:34)
>  ~[na:na]
>         at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen49.next(PriorityQueueCopierTemplate.java:76)
>  ~[na:na]
>         at 
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns(ExternalSortBatch.java:1214)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:689)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:92)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>  ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> ~[na:1.7.0_111]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_111]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  ~[hadoop-common-2.7.0-mapr-1607.jar:na]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 4 common frames omitted
> {code}
> I attached the logs and the profile files. The data set is too large to 
> attach here



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to