[
https://issues.apache.org/jira/browse/DRILL-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017661#comment-16017661
]
Khurram Faraaz commented on DRILL-5499:
---------------------------------------
I haven't tried this on 1.10.0, this was a new test, I will verify if this is
also seen on Drill 1.10.0.
We hit this after we discussed about testing with a query that used
subsequently large amount of main memory and then running that same query
concurrently, to observe memory usage behavior, as one of the failure injection
tests. Aim of test was to reduce the amount total main memory available to the
Drillbit while the concurrent queries were being executed.
Even before reducing the total available main memory to the Drillbit, we saw
the RESOURCE ERROR message.
> TPC-DS Query 4 concurrent execution RESOURCE ERROR: One or more nodes ran out
> of memory while executing the query.
> ------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-5499
> URL: https://issues.apache.org/jira/browse/DRILL-5499
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.11.0
> Environment: single node CentOS
> Reporter: Khurram Faraaz
>
> Running TPC-DS query 4 sequentially does not result in error/OOM. TPC-DS
> query 4 takes 7 GB memory on an average, on Drill 1.11.0.
> However, there is an OOM and RESOURCE ERROR when we run TPC-DS query 4
> concurrently using 8 threads on a single node that has 24 processing units
> available.
> {noformat}
> 2017-05-09 13:42:30,547 [BitServer-5] ERROR
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.
> Connection: /10.10.100.201:31012 <--> /10.10.100.201:59746 (data server).
> Closing connection.
> io.netty.handler.codec.DecoderException:
> org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate
> buffer of size 25165824 due to memory limit. Current allocation: 21037056
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to
> allocate buffer of size 25165824 due to memory limit. Current allocation:
> 21037056
> at
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> io.netty.buffer.ExpandableByteBuf.capacity(ExpandableByteBuf.java:43)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> ... 13 common frames omitted
> ...
> 2017-05-09 13:42:30,547 [26ee38c2-75c8-05cd-0cc0-9263fefba955:frag:34:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 26ee38c2-75c8-05cd-0cc0-9263fefba955:34:0: State change requested
> AWAITING_ALLOCATION --> RUNNING
> 2017-05-09 13:42:30,547 [26ee38c2-75c8-05cd-0cc0-9263fefba955:frag:34:0] INFO
> o.a.d.e.w.f.FragmentStatusReporter -
> 26ee38c2-75c8-05cd-0cc0-9263fefba955:34:0: State to report: RUNNING
> 2017-05-09 13:42:30,547 [BitServer-5] INFO
> o.a.d.exec.rpc.ProtobufLengthDecoder - Channel is closed, discarding
> remaining 3022732 byte(s) in buffer.
> 2017-05-09 13:42:30,548 [BitClient-1] WARN
> o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed channel.
> Connection: /10.10.100.201:59746 <--> centos-01.qa.lab/10.10.100.201:31012
> (data client)
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_91]
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> ~[na:1.8.0_91]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> ~[na:1.8.0_91]
> at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_91]
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> ~[na:1.8.0_91]
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:407)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:30)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
> at io.netty.buffer.DrillBuf.setBytes(DrillBuf.java:770)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
> at
> io.netty.buffer.MutableWrappedByteBuf.setBytes(MutableWrappedByteBuf.java:280)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
> at
> io.netty.buffer.ExpandableByteBuf.setBytes(ExpandableByteBuf.java:26)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:241)
> ~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> ~[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> 2017-05-09 13:42:30,548 [26ee38c2-7f3c-b492-5d47-97553d90ab1f:frag:72:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred: One or more nodes
> ran out of memory while executing the query. (Unable to allocate buffer of
> size 262144 due to memory limit. Current allocation: 3690496)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more
> nodes ran out of memory while executing the query.
> Unable to allocate buffer of size 262144 due to memory limit. Current
> allocation: 3690496
> [Error Id: 181248c2-bf8e-401e-9936-9b4cb7c2d213 ]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
> ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:244)
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to
> allocate buffer of size 262144 due to memory limit. Current allocation:
> 3690496
> at
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195)
> ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.vector.UInt4Vector.reAlloc(UInt4Vector.java:212)
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.vector.UInt4Vector$Mutator.setSafe(UInt4Vector.java:409)
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:605)
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:560)
> ~[vector-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.test.generated.ProjectorGen12364.doEval(ProjectorTemplate.java:73)
> ~[na:na]
> at
> org.apache.drill.exec.test.generated.ProjectorGen12364.projectRecords(ProjectorTemplate.java:67)
> ~[na:na]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
> ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[na:1.8.0_91]
> at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_91]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
> ~[hadoop-common-2.7.0-mapr-1607.jar:na]
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
> ... 4 common frames omitted
> 2017-05-09 13:42:30,548 [26ee38c2-7f3c-b492-5d47-97553d90ab1f:frag:72:0] INFO
> o.a.d.e.w.fragment.FragmentExecutor -
> 26ee38c2-7f3c-b492-5d47-97553d90ab1f:72:0: State change requested RUNNING -->
> FAILED
> {noformat}
> nproc --all; returned 24 on centos-01 node
> memory utilization (increase) while the concurrent program was under
> execution (i.e it was executing TPC-DS query 4 from many Threads)
> [root@centos-01 ~]# ps_mem --total -p 11371
> 3068859904
> [root@centos-01 ~]# ps_mem --total -p 11371
> 5967798784
> [root@centos-01 ~]# ps_mem --total -p 11371
> 7694078464
> [root@centos-01 ~]# ps_mem --total -p 11371
> 10252224512
> [root@centos-01 ~]# ps_mem --total -p 11371
> 10254476288
> The concurrent Java program used in test is available here, this is
> reproducible
> https://github.com/kfaraaz/tests/blob/master/src/ConcurrentQuery.java
> ps_mem tool is available here - https://github.com/pixelb/ps_mem
> ps_mem : A utility to accurately report the core memory usage for a program
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)