[ 
https://issues.apache.org/jira/browse/DRILL-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16177914#comment-16177914
 ] 

Lenin K Subramanian commented on DRILL-4708:
--------------------------------------------

While this exception happens in the client, i checked the server logs

{code:java}
2017-09-23 00:48:42,177 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:3:3] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a1-8356-5d3e-c6e3-919eb9b55274:3:3: 
State change requested RUNNING --> FINISHED
2017-09-23 00:48:42,177 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:3:3] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 263a50a1-8356-5d3e-c6e3-919eb9b55274:3:3: 
State to report: FINISHED
2017-09-23 00:48:42,218 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:3:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a1-8356-5d3e-c6e3-919eb9b55274:3:0: 
State change requested RUNNING --> FINISHED
2017-09-23 00:48:42,218 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:3:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 263a50a1-8356-5d3e-c6e3-919eb9b55274:3:0: 
State to report: FINISHED
2017-09-23 00:48:42,232 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:2:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a1-8356-5d3e-c6e3-919eb9b55274:2:0: 
State change requested RUNNING --> FINISHED
2017-09-23 00:48:42,232 [263a50a1-8356-5d3e-c6e3-919eb9b55274:frag:2:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 263a50a1-8356-5d3e-c6e3-919eb9b55274:2:0: 
State to report: FINISHED
2017-09-23 00:48:49,663 [UserServer-1] ERROR o.a.d.exec.rpc.RpcExceptionHandler 
- Exception in RPC communication.  Connection: /10.244.4.225:31010 <--> 
/10.244.6.101:48476 (user server).  Closing connection.
java.io.IOException: syscall:read(...)() failed: Connection reset by peer
2017-09-23 00:48:49,664 [UserServer-1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a4-a34a-7947-cf70-d5da9229d3d0:0:0: 
State change requested RUNNING --> FAILED
2017-09-23 00:48:49,665 [263a50a4-a34a-7947-cf70-d5da9229d3d0:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a4-a34a-7947-cf70-d5da9229d3d0:0:0: 
State change requested FAILED --> FAILED
2017-09-23 00:48:49,665 [263a50a4-a34a-7947-cf70-d5da9229d3d0:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a4-a34a-7947-cf70-d5da9229d3d0:0:0: 
State change requested FAILED --> FAILED
2017-09-23 00:48:49,665 [263a50a4-a34a-7947-cf70-d5da9229d3d0:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a4-a34a-7947-cf70-d5da9229d3d0:0:0: 
State change requested FAILED --> FAILED
2017-09-23 00:48:49,665 [263a50a4-a34a-7947-cf70-d5da9229d3d0:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 263a50a4-a34a-7947-cf70-d5da9229d3d0:0:0: 
State change requested FAILED --> FINISHED
2017-09-23 00:48:49,668 [263a50a4-a34a-7947-cf70-d5da9229d3d0:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: ChannelClosedException: 
Channel closed /10.244.4.225:31010 <--> /10.244.6.102:35926.

Fragment 0:0

[Error Id: c66ca014-436e-44a1-9d84-32326f5a52de on falkonry-drill-01:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ChannelClosedException: Channel closed /10.244.4.225:31010 <--> 
/10.244.6.102:35926.

Fragment 0:0

[Error Id: c66ca014-436e-44a1-9d84-32326f5a52de on falkonry-drill-01:31010]
        at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
 ~[drill-common-1.11.0.jar:1.11.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0.jar:1.11.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0.jar:1.11.0]
        at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0.jar:1.11.0]
        at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0.jar:1.11.0]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_111]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed 
/10.244.4.225:31010 <--> /10.244.6.102:35926.
        at 
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164)
 ~[drill-rpc-1.11.0.jar:1.11.0]
        at 
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144)
 ~[drill-rpc-1.11.0.jar:1.11.0]
        at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]
        at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
 ~[netty-common-4.0.27.Final.jar:4.0.27.Final]

{code}

Based on my observation it appears to be, client observes a read timeout after 
which it closes the connection, which cancels the query in the server as we can 
see in the server log. Because of the read timeout the client thinks the 
drillbit might be down and hence the exception.

Is there a way increase the read timeout, as i can from the client logs, it 
seems to be using oadd.io.netty channel to communicate with server


{code:java}
oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler$1.operationComplete(QueryResultHandler.java:373)
oadd.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
oadd.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
oadd.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
oadd.io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
oadd.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
oadd.io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
oadd.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
oadd.io.netty.channel.ChannelDuplexHandler.close(ChannelDuplexHandler.java:73)
oadd.io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
oadd.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
oadd.io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
oadd.org.apache.drill.exec.rpc.RpcExceptionHandler.exceptionCaught(RpcExceptionHandler.java:39)
{code}


> connection closed unexpectedly
> ------------------------------
>
>                 Key: DRILL-4708
>                 URL: https://issues.apache.org/jira/browse/DRILL-4708
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - RPC
>    Affects Versions: 1.7.0
>            Reporter: Chun Chang
>            Priority: Critical
>
> Running DRILL functional automation, we often see query failed randomly due 
> to the following unexpected connection close error.
> {noformat}
> Execution Failures:
> /root/drillAutomation/framework/framework/resources/Functional/ctas/ctas_flatten/100000rows/filter5.q
> Query: 
> select * from dfs.ctas_flatten.`filter5_100000rows_ctas`
> Failed with exception
> java.sql.SQLException: CONNECTION ERROR: Connection /10.10.100.171:36185 <--> 
> drillats4.qa.lab/10.10.100.174:31010 (user client) closed unexpectedly. 
> Drillbit down?
> [Error Id: 3d5dad8e-80d0-4c7f-9012-013bf01ce2b7 ]
>       at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>       at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321)
>       at 
> oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
>       at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:210)
>       at 
> org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:99)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> Caused by: oadd.org.apache.drill.common.exceptions.UserException: CONNECTION 
> ERROR: Connection /10.10.100.171:36185 <--> 
> drillats4.qa.lab/10.10.100.174:31010 (user client) closed unexpectedly. 
> Drillbit down?
> [Error Id: 3d5dad8e-80d0-4c7f-9012-013bf01ce2b7 ]
>       at 
> oadd.org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>       at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler$ChannelClosedHandler$1.operationComplete(QueryResultHandler.java:373)
>       at 
> oadd.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>       at 
> oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
>       at 
> oadd.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
>       at 
> oadd.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
>       at 
> oadd.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
>       at 
> oadd.io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
>       at 
> oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
>       at 
> oadd.io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
>       at 
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.closeOnRead(AbstractNioByteChannel.java:71)
>       at 
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:89)
>       at 
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:162)
>       at 
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       ... 1 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to