Robert Hou created DRILL-5902:
---------------------------------
Summary: Regression: Queries encounter random failure due to RPC
connection timed out
Key: DRILL-5902
URL: https://issues.apache.org/jira/browse/DRILL-5902
Project: Apache Drill
Issue Type: Bug
Components: Execution - RPC
Affects Versions: 1.11.0
Reporter: Robert Hou
Priority: Critical
Multiple random failures (25) occurred with the latest
Functional-Baseline-88.193 run. Here is a sample query:
{noformat}
-- Kitchen sink
-- Use all supported functions
select
rank() over W,
dense_rank() over W,
percent_rank() over W,
cume_dist() over W,
avg(c_integer + c_integer) over W,
sum(c_integer/100) over W,
count(*) over W,
min(c_integer) over W,
max(c_integer) over W,
row_number() over W
from
j7
where
c_boolean is not null
window W as (partition by c_bigint, c_date, c_time, c_boolean order by
c_integer)
{noformat}
>From the logs:
{noformat}
2017-10-23 04:14:36,536 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:1 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:5 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:9 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:13 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,537 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:17 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:21 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
2017-10-23 04:14:36,538 [BitServer-7] WARN o.a.d.e.w.b.ControlMessageHandler -
Dropping request for early fragment termination for path
261230e8-d03e-9ca9-91bf-c1039deecde2:1:25 ->
261230e8-d03e-9ca9-91bf-c1039deecde2:0:0 as path to executor unavailable.
{noformat}
{noformat}
2017-10-23 04:14:53,941 [UserServer-1] INFO o.a.drill.exec.rpc.user.UserServer
- RPC connection /10.10.88.196:31010 <--> /10.10.88.193:38281 (user server)
timed out. Timeout was set to 30 seconds. Closing connection.
2017-10-23 04:14:53,952 [UserServer-1] INFO
o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0:
State change requested RUNNING --> FAILED
2017-10-23 04:14:53,952 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] INFO
o.a.d.e.w.fragment.FragmentExecutor - 261230f8-2698-15b2-952f-d4ade8d6b180:0:0:
State change requested FAILED --> FINISHED
2017-10-23 04:14:53,956 [UserServer-1] WARN
o.apache.drill.exec.rpc.RequestIdMap - Failure while attempting to fail rpc
response.
java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043) ~[na:1.7.0_45]
at
org.apache.drill.common.DeferredException.addException(DeferredException.java:88)
~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:97)
~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:413)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.access$700(FragmentExecutor.java:55)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.fail(FragmentExecutor.java:427)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.ops.FragmentContext.fail(FragmentContext.java:213)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:95)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.ops.FragmentContext$1.accept(FragmentContext.java:92)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.ops.StatusHandler.failed(StatusHandler.java:42)
~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RequestIdMap$RpcListener.setException(RequestIdMap.java:139)
~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:76)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RequestIdMap$SetExceptionProcedure.apply(RequestIdMap.java:66)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
com.carrotsearch.hppc.IntObjectHashMap.forEach(IntObjectHashMap.java:692)
[hppc-0.7.1.jar:na]
at
org.apache.drill.exec.rpc.RequestIdMap.channelClosed(RequestIdMap.java:62)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.AbstractRemoteConnection.channelClosed(AbstractRemoteConnection.java:192)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.AbstractServerConnection.channelClosed(AbstractServerConnection.java:165)
[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:167)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1003)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:632)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:610)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:553)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1236)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:459)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:99)
[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:147)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90)
[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:494)
[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:466)
[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
[netty-common-4.0.48.Final.jar:4.0.48.Final]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
/10.10.88.196:31010 <--> /10.10.88.193:38343.
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164)
[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
... 29 common frames omitted
2017-10-23 04:14:53,963 [UserServer-1] INFO
o.a.drill.exec.rpc.user.UserSession - Deleted session temporary location
[/tmp/bec73426-40b7-44b6-8819-7958634614e9] from file system [maprfs:///]
2017-10-23 04:14:53,966 [261230f8-2698-15b2-952f-d4ade8d6b180:frag:0:0] ERROR
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: ChannelClosedException:
Channel closed /10.10.88.196:31010 <--> /10.10.88.193:38343.
Fragment 0:0
[Error Id: d3c59456-284f-4c35-988f-f052b7f6c4b0 on atsqa4-196.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
ChannelClosedException: Channel closed /10.10.88.196:31010 <-->
/10.10.88.193:38343.
Fragment 0:0
[Error Id: d3c59456-284f-4c35-988f-f052b7f6c4b0 on atsqa4-196.qa.lab:31010]
at
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586)
~[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:298)
[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267)
[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_45]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed
/10.10.88.196:31010 <--> /10.10.88.193:38343.
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164)
~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144)
~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:1003)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:632)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:610)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:553)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1236)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:618)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:602)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:459)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:99)
~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:147)
~[drill-rpc-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at
io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90)
~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:494)
~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:466)
~[netty-handler-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
~[netty-transport-4.0.48.Final.jar:4.0.48.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
~[netty-common-4.0.48.Final.jar:4.0.48.Final]
... 1 common frames omitted
2017-10-23 04:14:54,089 [UserServer-1] INFO
o.a.drill.exec.work.foreman.Foreman - Failure while trying communicate query
result to initiating client. This would happen if a client is disconnected
before response notice can be sent.
org.apache.drill.exec.rpc.RpcException: Failure sending message.
{noformat}
sys.version is:
1.12.0-SNAPSHOT f1d1945b3772bb782039fd6811e34a7de66441c8 DRILL-5582: C++
Client: [Threat Modeling] Drillbit may be spoofed by an attacker and this may
lead to data being written to the attacker's target instead of Drillbit
19.10.2017 @ 17:13:05 PDT Unknown 19.10.2017 @ 18:37:13 PDT
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)