[
https://issues.apache.org/jira/browse/DRILL-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209962#comment-15209962
]
Deneche A. Hakim edited comment on DRILL-3714 at 3/24/16 11:22 AM:
-------------------------------------------------------------------
Here is my understanding of how Netty handles this:
In general you can send a bunch of messages, add a listener to the last
message's ChannelFuture and you this listener to close the connection. Netty
guarantees that the client will receive all messages sent before the channel's
CloseFuture's listener is called on the client side
In this particular case the only difference is that when an exception is caught
in the RPC layer on the data-server side, RpcExceptionHandler closes the
connection immediately. So we may lose some messages that were meant to be sent
from the server to the client but Netty should still guarantee that any message
properly sent will reach the client before the CloseFuture's listener is called.
If my understanding is correct then we should have no problem calling
CoordinationQueue.channelClosed() on the client side as well
was (Author: adeneche):
[~hgunes] can you shed some light on this ? thanks
> Query runs out of memory and remains in CANCELLATION_REQUESTED state until
> drillbit is restarted
> ------------------------------------------------------------------------------------------------
>
> Key: DRILL-3714
> URL: https://issues.apache.org/jira/browse/DRILL-3714
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.2.0
> Reporter: Victoria Markman
> Assignee: Deneche A. Hakim
> Priority: Critical
> Fix For: 1.7.0
>
> Attachments: Screen Shot 2015-08-26 at 10.36.33 AM.png, drillbit.log,
> jstack.txt, query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
>
>
> This is a variation of DRILL-3705 with the difference of drill behavior when
> hitting OOM condition.
> Query runs out of memory during execution and remains in
> "CANCELLATION_REQUESTED" state until drillbit is bounced.
> Client (sqlline in this case) never gets a response from the server.
> Reproduction details:
> Single node drillbit installation.
> DRILL_MAX_DIRECT_MEMORY="8G"
> DRILL_HEAP="4G"
> Run this query on TPCDS SF100 data set
> {code}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS
> TotalSpend FROM store_sales ss WHERE ss.ss_store_sk IS NOT NULL ORDER BY 1
> LIMIT 10;
> {code}
> drillbit.log
> {code}
> 2015-08-26 16:54:58,469 [2a2210a7-7a78-c774-d54c-c863d0b77bb0:frag:3:22] INFO
> o.a.d.e.w.f.FragmentStatusReporter -
> 2a2210a7-7a78-c774-d54c-c863d0b77bb0:3:22: State to report: RUNNING
> 2015-08-26 16:55:50,498 [BitServer-5] WARN
> o.a.drill.exec.rpc.data.DataServer - Message of mode REQUEST of rpc type 3
> took longer than 500ms. Actual duration was 2569ms.
> 2015-08-26 16:56:31,086 [BitServer-5] ERROR
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54554 (data server).
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct
> buffer memory
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71]
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
> ~[na:1.7.0_71]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
> ~[na:1.7.0_71]
> at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:168)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600)
> ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28)
> ~[drill-java-exec-1.2.0-SNAPSHOT.jar:4.0.27.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> ... 11 common frames omitted
> 2015-08-26 16:56:31,087 [BitServer-5] INFO
> o.a.d.exec.rpc.ProtobufLengthDecoder - Channel is closed, discarding
> remaining 124958 byte(s) in buffer.
> 2015-08-26 16:56:31,087 [BitClient-1] ERROR
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.
> Connection: /10.10.88.133:54554 <--> /10.10.88.133:31012 (data client).
> Closing connection.
> java.io.IOException: syscall:read(...)() failed: Connection reset by peer
> 2015-08-26 16:56:31,088 [BitClient-1] INFO
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.133:54554 <-->
> /10.10.88.133:31012.
> 2015-08-26 16:56:35,325 [BitServer-6] ERROR
> o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication.
> Connection: /10.10.88.133:31012 <--> /10.10.88.133:54555 (data server).
> Closing connection.
> io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct
> buffer memory
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233)
> ~[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at
> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
> [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> [netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> {code}
> Attached:
> query_profile_2a2210a7-7a78-c774-d54c-c863d0b77bb0.json
> drillbit.log
> jstack.txt
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)