Is Drill terminating threads correctly?
In running jstack on a JVM running a dev. test run that ended up hung
after getting about three test timeout errors, I see that there are
409 threads.
Although 138 of those are not-unexpected ShutdownHook threads (since
many tests are run in one VM), there are:
- 138 "WorkManager.StatusThread" threads (hmm.... 138 again)
- 7 "Client-1" threads
- 4 "UserServer-1" threads
- 21 "BitClient-1" threads
- 4 "BitClient-2" threads
- 3 "BitClient-3" threads
- 8 "BitServer-1" threads
- 8 "BitServer-2" threads
- 7 "BitServer-3" threads
- 7 "BitServer-4" threads
- 7 "BitServer-5" threads
- 6 "BitServer-6" threads
- 6 "BitServer-7" threads
- 6 "BitServer-8" threads
- 5 "BitServer-9" threads
- 5 "BitServer-10" threads
(Other thread names have only 1 or 2 occurrences.)
Regarding the 4 for the number of "UserServer-1" threads: Three test
methods had timeout failures plus one got hung.
Here's the tail end of the output from the test running, including
all the timeout errors and including the hang (except for repeated
query-results data lines).
dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install
<TRIMMED>
Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun
Running
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun
Running
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun
Running
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec - in
org.apache.drill.exec.physical.impl.TestDistributedFragmentRun
Running org.apache.drill.exec.physical.impl.TestBroadcastExchange
Running
org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans
00:44:34.017 [globalEventExecutor-1-523] ERROR
o.a.z.server.NIOServerCnxnFactory - Thread
Thread[globalEventExecutor-1-523,5,main] died
java.lang.AssertionError: null
at
io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
~[netty-common-4.0.27.Final.jar:4.0.27.Final]
at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72]
Running
org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange
10000
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 sec <<<
FAILURE! - in org.apache.drill.exec.physical.impl.TestBroadcastExchange
TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
Time elapsed: 50.063 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at
org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at
org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at
org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62)
TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange)
Time elapsed: 50.014 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at
org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at
org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88)
Running org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
Running
org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads
Running
org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm
ok summary
true planner.slice_target updated.
Total rows returned : 1. Returned in 38ms.
Jul 10, 2015 12:47:20 AM org.apache.calcite.sql.validate.SqlValidatorException
<init>
SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table
'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
not found
Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init>
SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, column
64 to line 1, column 66: Table
'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799'
not found
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec <<<
FAILURE! - in org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender
testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender)
Time elapsed: 50.023 sec <<< ERROR!
java.lang.Exception: test timed out after 50000 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at
io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254)
at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32)
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31)
at org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218)
at org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at
org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87)
at com.google.common.io.Closeables.close(Closeables.java:77)
at com.google.common.io.Closeables.closeQuietly(Closeables.java:108)
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288)
at org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239)
at
org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126)
at
org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154)
Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in
org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort
Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover
Running
org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval
blue red green
2147483647 9223372036854775807 2147483647
2147483647 9223372036854775807 2147483647
<TRIMMED>
2147483647 9223372036854775807 2147483647
Total rows returned : 50. Returned in 181ms.
Running
org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter
blue red green
true -9223372036854775808 -2147483648
false 9223372036854775807 null
<TRIMMED>
true -9223372036854775808 -2147483648
false 9223372036854775807 null
Total rows returned : 100. Returned in 54ms.
C-c C-c
real 708m38.962s
user 11m3.332s
sys 1m8.068s
[130]dbarclay@dev-linux2 ~/work/git/incubator-drill $
Daniel
--
Daniel Barclay
MapR Technologies