[ 
https://issues.apache.org/jira/browse/DRILL-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541283#comment-14541283
 ] 

Chris Westin commented on DRILL-2750:
-------------------------------------

Without anything more specific to go on, I just tried to find leaks in TPCH 
queries; if there are leaks, then any queries after a leak may be unable to run 
because there isn't enough memory left.

To run TPCH queries, I worked with the unit tests
TestTpchDistributed
TestTestTpchDistributedStreaming
TestTpchLimit0

For each, I started with unlimited memory, and then gradually ratcheted it down 
by starting with 1G, and halving the amount of available memory each time 
(using -Ddrill.exec.memory.top.max=X, where X is the amount of memory in bytes 
to limit direct memory usage to). In different tests problems started happening 
at 32M or 16M where queries started to fail. As I continued, I got queries to 
fail, and for memory leaks to be reported. [~jnadeau] helped me track through 
the operators at this point, because I'm not familiar with that code. We found 
and fixed several leaks in this way. We got all of these tests to run all the 
way through, with almost all queries failing due to out of memory conditions, 
but without any leaks reported at the end. Then we tried the same thing with 
TestExampleQueries (to get another range of query types), but we didn't get any 
additional failures under low memory conditions.


> Running 1 or more queries against Drillbits having insufficient DirectMem 
> renders the Drillbits in an unusable state
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2750
>                 URL: https://issues.apache.org/jira/browse/DRILL-2750
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.9.0
>         Environment: RHEL 6.4
>            Reporter: Kunal Khatua
>            Assignee: Chris Westin
>            Priority: Critical
>             Fix For: 1.0.0
>
>
> When running queries against a Drill cluster with limited DirectMem; if one 
> or more queries fail due to insufficient memory, then even queries that 
> should easily run within the allocated memory fail.
> The initial failure when queries with large memory requirements fail: 
> 2015-04-10 09:57:55 [pip0] ERROR PipSQuawkling fetchRows - [ 1 / 16_par1000 ] 
> Failure while executing query.
> java.sql.SQLException: Failure while executing query.
>         at org.apache.drill.jdbc.DrillCursor.next(DrillCursor.java:144)
>         at 
> net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
>         at org.apache.drill.jdbc.DrillResultSet.next(DrillResultSet.java:85)
>         at PipSQuawkling.fetchRows(PipSQuawkling.java:319)
>         at PipSQuawkling.executeTest(PipSQuawkling.java:154)
>         at PipSQuawkling.run(PipSQuawkling.java:76)
> Caused by: org.apache.drill.exec.rpc.RpcException: RemoteRpcException: 
> Failure while running fragment.[ e8c657a7-93a9-415a-8641-a4fbd4836a65 on 
> ucs-node5.perf.lab:31010 ]
> [ e8c657a7-93a9-415a-8641-a4fbd4836a65 on ucs-node5.perf.lab:31010 ]
>         at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:111)
>         at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:100)
>         at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:52)
>         at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:34)
>         at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:57)
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194)
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173)
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>         at java.lang.Thread.run(Thread.java:744)
> After that, subsequent queries that should run, fail with the following:
> 2015-04-10 09:59:29 [pip0] ERROR PipSQuawkling executeQuery - [ 2 / 
> rerun_06_par1000 ] exception while executing query: Failure while executing 
> query.
> java.sql.SQLException: exception while executing query: Failure while 
> executing query.
>         at net.hydromatic.avatica.Helper.createException(Helper.java:40)
>         at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:406)
>         at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>         at 
> net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
>         at PipSQuawkling.executeQuery(PipSQuawkling.java:284)
>         at PipSQuawkling.executeTest(PipSQuawkling.java:144)
>         at PipSQuawkling.run(PipSQuawkling.java:76)
> Caused by: java.sql.SQLException: Failure while executing query.
>         at org.apache.drill.jdbc.DrillCursor.next(DrillCursor.java:144)
>         at 
> org.apache.drill.jdbc.DrillResultSet.execute(DrillResultSet.java:105)
>         at 
> org.apache.drill.jdbc.DrillResultSet.execute(DrillResultSet.java:44)
>         at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>         ... 5 more
> Caused by: org.apache.drill.exec.rpc.RpcException: RemoteRpcException: 
> Failure while trying to start remote fragment, You attempted to create a new 
> child allocator with initial reservation 6000000 but only 110395 bytes of 
> memory were available. [ 689006cb-d703-42c3-860d
> -bfecc0a66312 on ucs-node10.perf.lab:31010 ]
>         at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:111)
>         at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:100)
>         at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:52)
>         at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:34)
>         at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:57)
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:194)
>         at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:173)
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:161)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>         at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to