[ 
https://issues.apache.org/jira/browse/DRILL-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053536#comment-16053536
 ] 

Paul Rogers commented on DRILL-5513:
------------------------------------

The revisions to the external sort in DRILL-5325 (still pending commit) 
partially addressed this issue. Rather than assume we can use the full 5 MB in 
this case, the code assumes internal vector fragmentation and assumes that, on 
average, only 3/4 of any vector is usable. (Explanation: every vector will be 
half full, else it's size would not have doubled. It will then fill a random 
amount of the upper half: half of the upper half on average, for an average 
fill rate of 3/4.)

Changes in DRILL-5522 handled some issues that arise with deserialized rows.

This bug required additional adjustments. The data uses a repeated map vector, 
which has an offset vector associated with the list of maps. That data was not 
accounted for in the earlier changes, so added that value. (It would be so much 
easier if batches knew their memory usage!)

> Managed External Sort : OOM error during the merge phase
> --------------------------------------------------------
>
>                 Key: DRILL-5513
>                 URL: https://issues.apache.org/jira/browse/DRILL-5513
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: 26e5f7b8-71e8-afca-e72e-fad7be2b2416.sys.drill, 
> drillbit.log
>
>
> git.commit.id.abbrev=1e0a14c
> No of nodes in cluster : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> The below query fails with an OOM
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_query` = 100;
> alter session set `planner.memory.max_query_memory_per_node` = 652428800;
> select count(*) from (select s1.type type, flatten(s1.rms.rptd) rptds from 
> (select d.type type, d.uid uid, flatten(d.map.rm) rms from 
> dfs.`/drill/testdata/resource-manager/nested-large.json` d order by d.uid) s1 
> order by s1.rms.mapid);
> {code}
> Exception from the logs
> {code}
> 2017-05-15 12:58:46,646 [BitServer-4] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 26e5f7b8-71e8-afca-e72e-fad7be2b2416: 
> State change requested RUNNING --> FAILED
> org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One 
> or more nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 2097152 due to memory limit. Current 
> allocation: 19791880
> Fragment 5:2
> [Error Id: bb67176f-a780-400d-88c9-06fea131ea64 on qa-node190.qa.lab:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) Unable to allocate 
> buffer of size 2097152 due to memory limit. Current allocation: 19791880
>     org.apache.drill.exec.memory.BaseAllocator.buffer():220
>     org.apache.drill.exec.memory.BaseAllocator.buffer():195
>     org.apache.drill.exec.vector.BigIntVector.reAlloc():212
>     org.apache.drill.exec.vector.BigIntVector.copyFromSafe():324
>     org.apache.drill.exec.vector.NullableBigIntVector.copyFromSafe():367
>     
> org.apache.drill.exec.vector.NullableBigIntVector$TransferImpl.copyValueSafe():328
>     
> org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.copyValueSafe():360
>     
> org.apache.drill.exec.vector.complex.MapVector$MapTransferPair.copyValueSafe():220
>     org.apache.drill.exec.vector.complex.MapVector.copyFromSafe():82
>     
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.doCopy():34
>     org.apache.drill.exec.test.generated.PriorityQueueCopierGen4494.next():76
>     
> org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next():234
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeSpilledRuns():1214
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load():689
>     
> org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext():559
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next():215
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745
>         at 
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:537)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:94)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.batch.ControlMessageHandler.handle(ControlMessageHandler.java:55)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:159) 
> [drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.rpc.BasicServer.handle(BasicServer.java:53) 
> [drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
> [drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) 
> [drill-rpc-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:150)
>  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) 
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) 
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  [netty-common-4.0.27.Final.jar:4.0.27.Final]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> {code}
> Attached the log and profile files. The dataset is not attached here as it is 
> larger than the permitted size



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to