[
https://issues.apache.org/jira/browse/DRILL-7000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sorabh Hamirwasia updated DRILL-7000:
-------------------------------------
Description:
The query does not return even though Fragment 0:0 reports a state change from
RUNNING -> FINISHED
{code:java}
Following is the jstack output of the Frag0:0.
"23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 os_prio=0
tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition [0x00007f61b32b2000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116)
at
org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113)
at
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92)
locked <0x000000055f9a7468> (a
org.apache.drill.exec.work.foreman.QueryStateProcessor)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342)
at
org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155)
at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213)
at
org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519)
at
org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65)
at
org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483)
at
org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155)
at
org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65)
at
org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546)
at
org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63)
at
org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253)
at
org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130)
at
org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89)
at
org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122)
at
org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}
- From the code, it seems that RuntimeFilterSink.close is stuck at
{code:java}
while (!asyncAggregateWorker.over.get()) {
try
{ Thread.sleep(100); }
catch (InterruptedException e)
{ logger.error("interrupted while sleeping to wait for the aggregating worker
thread to exit", e); }
}
{code}
This is because AsyncAggregateWorker exits due to the following exception,
before it could set asyncAggregateWorker.over is set to false.
{code:java}
2019-01-22 16:01:18,773 [drill-executor-1301] ERROR
o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW
java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
was:
The query does not return even though Fragment 0:0 reports a state change from
RUNNING -> FINISHED
{code:java}
Following is the jstack output of the Frag0:0.
"23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10 os_prio=0
tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition [0x00007f61b32b2000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116)
at
org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113)
at
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92)
locked <0x000000055f9a7468> (a
org.apache.drill.exec.work.foreman.QueryStateProcessor)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342)
at
org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344)
at
org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155)
at org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213)
at
org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519)
at
org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65)
at
org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483)
at
org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155)
at
org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65)
at
org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546)
at
org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63)
at
org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253)
at
org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130)
at
org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89)
at
org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122)
at
org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
at
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
at
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}
- From the code, it seems that RuntimeFilterSink.close is stuck at
{code:java}
while (!asyncAggregateWorker.over.get()) {
try
{ Thread.sleep(100); }
catch (InterruptedException e)
{ logger.error("interrupted while sleeping to wait for the aggregating worker
thread to exit", e); }
}
{code}
This is because AsyncAggregateWorker exits due to the following exception,
before it could set asyncAggregateWorker.over is set to false.
{code:java}
2019-01-22 16:01:18,773 [drill-executor-1301] ERROR
o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW
java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at
org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246)
~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_151]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
A simple fix would be to add over.set(true) to the finally block in
AsyncAggregateWorker.run.
> Queries failing with "Failed to aggregate or route the RFW" do not complete
> ---------------------------------------------------------------------------
>
> Key: DRILL-7000
> URL: https://issues.apache.org/jira/browse/DRILL-7000
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Affects Versions: 1.16.0
> Reporter: Sorabh Hamirwasia
> Assignee: Sorabh Hamirwasia
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> The query does not return even though Fragment 0:0 reports a state change
> from RUNNING -> FINISHED
>
> {code:java}
> Following is the jstack output of the Frag0:0.
> "23b85137-b102-39a9-70d9-72381c5fb93b:frag:0:0" #16037 daemon prio=10
> os_prio=0 tid=0x00007f5f48d415d0 nid=0x1a61 waiting on condition
> [0x00007f61b32b2000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.drill.exec.work.filter.RuntimeFilterSink.close(RuntimeFilterSink.java:116)
> at
> org.apache.drill.exec.work.filter.RuntimeFilterRouter.waitForComplete(RuntimeFilterRouter.java:113)
> at
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor.wrapUpCompletion(QueryStateProcessor.java:315)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor.running(QueryStateProcessor.java:276)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:92)
> locked <0x000000055f9a7468> (a
> org.apache.drill.exec.work.foreman.QueryStateProcessor)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:349)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.processEvent(QueryStateProcessor.java:342)
> at
> org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107)
> at org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor$StateSwitch.addEvent(QueryStateProcessor.java:344)
> at
> org.apache.drill.exec.work.foreman.QueryStateProcessor.addToEventQueue(QueryStateProcessor.java:155)
> at
> org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:213)
> at
> org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:519)
> at
> org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:65)
> at
> org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:483)
> at
> org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:155)
> at
> org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:65)
> at
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:546)
> at
> org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:63)
> at
> org.apache.drill.exec.work.batch.ControlMessageHandler.requestFragmentStatus(ControlMessageHandler.java:253)
> at
> org.apache.drill.exec.rpc.control.LocalControlConnectionManager.runCommand(LocalControlConnectionManager.java:130)
> at
> org.apache.drill.exec.rpc.control.ControlTunnel.sendFragmentStatus(ControlTunnel.java:89)
> at
> org.apache.drill.exec.work.fragment.FragmentStatusReporter.sendStatus(FragmentStatusReporter.java:122)
> at
> org.apache.drill.exec.work.fragment.FragmentStatusReporter.stateChanged(FragmentStatusReporter.java:91)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:367)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:219)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:330)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) {code}
> - From the code, it seems that RuntimeFilterSink.close is stuck at
>
> {code:java}
> while (!asyncAggregateWorker.over.get()) {
> try
> { Thread.sleep(100); }
> catch (InterruptedException e)
> { logger.error("interrupted while sleeping to wait for the aggregating worker
> thread to exit", e); }
> }
> {code}
> This is because AsyncAggregateWorker exits due to the following exception,
> before it could set asyncAggregateWorker.over is set to false.
> {code:java}
> 2019-01-22 16:01:18,773 [drill-executor-1301] ERROR
> o.a.d.e.w.filter.RuntimeFilterSink - Failed to aggregate or route the RFW
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.drill.exec.work.filter.RuntimeFilterWritable.unwrap(RuntimeFilterWritable.java:67)
> ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.filter.RuntimeFilterWritable.aggregate(RuntimeFilterWritable.java:78)
> ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.filter.RuntimeFilterSink.aggregate(RuntimeFilterSink.java:140)
> ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.filter.RuntimeFilterSink.access$600(RuntimeFilterSink.java:52)
> ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
> at
> org.apache.drill.exec.work.filter.RuntimeFilterSink$AsyncAggregateWorker.run(RuntimeFilterSink.java:246)
> ~[drill-java-exec-1.16.0-SNAPSHOT.jar:1.16.0-SNAPSHOT]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_151]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_151]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_151]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_151]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)