Is there any way to re-produce this at a smaller scale? Have you tried failing a couple of tests and dumping threads?
-Hanifi Thanks On Fri, Jul 10, 2015 at 1:10 PM, Daniel Barclay <[email protected]> wrote: > Is Drill terminating threads correctly? > > In running jstack on a JVM running a dev. test run that ended up hung > after getting about three test timeout errors, I see that there are > 409 threads. > > Although 138 of those are not-unexpected ShutdownHook threads (since > many tests are run in one VM), there are: > - 138 "WorkManager.StatusThread" threads (hmm.... 138 again) > - 7 "Client-1" threads > - 4 "UserServer-1" threads > - 21 "BitClient-1" threads > - 4 "BitClient-2" threads > - 3 "BitClient-3" threads > - 8 "BitServer-1" threads > - 8 "BitServer-2" threads > - 7 "BitServer-3" threads > - 7 "BitServer-4" threads > - 7 "BitServer-5" threads > - 6 "BitServer-6" threads > - 6 "BitServer-7" threads > - 6 "BitServer-8" threads > - 5 "BitServer-9" threads > - 5 "BitServer-10" threads > (Other thread names have only 1 or 2 occurrences.) > > Regarding the 4 for the number of "UserServer-1" threads: Three test > methods had timeout failures plus one got hung. > > > Here's the tail end of the output from the test running, including > all the timeout errors and including the hang (except for repeated > query-results data lines). > > > > dbarclay@dev-linux2 ~/work/git/incubator-drill $ time mvn install > > <TRIMMED> > > Running org.apache.drill.exec.physical.impl.TestDistributedFragmentRun > Running > org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeOneEntryRun > Running > org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#twoBitOneExchangeTwoEntryRun > Running > org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRun > Running > org.apache.drill.exec.physical.impl.TestDistributedFragmentRun#oneBitOneExchangeTwoEntryRunLogical > Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 48.117 sec > - in org.apache.drill.exec.physical.impl.TestDistributedFragmentRun > Running org.apache.drill.exec.physical.impl.TestBroadcastExchange > Running > org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestSingleBroadcastExchangeWithTwoScans > 00:44:34.017 [globalEventExecutor-1-523] ERROR > o.a.z.server.NIOServerCnxnFactory - Thread > Thread[globalEventExecutor-1-523,5,main] died > java.lang.AssertionError: null > at > io.netty.util.concurrent.AbstractScheduledEventExecutor.pollScheduledTask(AbstractScheduledEventExecutor.java:83) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.GlobalEventExecutor.fetchFromScheduledTaskQueue(GlobalEventExecutor.java:110) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.GlobalEventExecutor.takeTask(GlobalEventExecutor.java:95) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:226) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > ~[netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_72] > Running > org.apache.drill.exec.physical.impl.TestBroadcastExchange#TestMultipleSendLocationBroadcastExchange > 10000 > Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 111.599 > sec <<< FAILURE! - in > org.apache.drill.exec.physical.impl.TestBroadcastExchange > TestSingleBroadcastExchangeWithTwoScans(org.apache.drill.exec.physical.impl.TestBroadcastExchange) > Time elapsed: 50.063 sec <<< ERROR! > java.lang.Exception: test timed out after 50000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32) > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31) > at > org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at > org.apache.drill.exec.rpc.data.DataConnectionCreator.close(DataConnectionCreator.java:70) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at > org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:88) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288) > at > org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestSingleBroadcastExchangeWithTwoScans(TestBroadcastExchange.java:62) > > TestMultipleSendLocationBroadcastExchange(org.apache.drill.exec.physical.impl.TestBroadcastExchange) > Time elapsed: 50.014 sec <<< ERROR! > java.lang.Exception: test timed out after 50000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32) > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31) > at > org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218) > at > org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at > org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288) > at > org.apache.drill.exec.physical.impl.TestBroadcastExchange.TestMultipleSendLocationBroadcastExchange(TestBroadcastExchange.java:88) > > Running > org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender > Running > org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testPartitionSenderCostToThreads > Running > org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender#testAlgorithm > ok summary > true planner.slice_target updated. > Total rows returned : 1. Returned in 38ms. > Jul 10, 2015 12:47:20 AM > org.apache.calcite.sql.validate.SqlValidatorException <init> > SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Table > 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' > not found > Jul 10, 2015 12:47:20 AM org.apache.calcite.runtime.CalciteException <init> > SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1, > column 64 to line 1, column 66: Table > 'dfs./home/dbarclay/work/git/incubator-drill/exec/java-exec/target/junit5218680774082947123/junit6147831434075535799' > not found > Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.904 sec > <<< FAILURE! - in > org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender > testPartitionSenderCostToThreads(org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender) > Time elapsed: 50.023 sec <<< ERROR! > java.lang.Exception: test timed out after 50000 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:254) > at > io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:32) > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:31) > at > org.apache.drill.exec.rpc.BasicServer.close(BasicServer.java:218) > at > org.apache.drill.exec.rpc.user.UserServer.close(UserServer.java:283) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at > org.apache.drill.exec.service.ServiceEngine.close(ServiceEngine.java:87) > at com.google.common.io.Closeables.close(Closeables.java:77) > at com.google.common.io > .Closeables.closeQuietly(Closeables.java:108) > at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:288) > at > org.apache.drill.BaseTestQuery.closeClient(BaseTestQuery.java:239) > at > org.apache.drill.BaseTestQuery.updateTestCluster(BaseTestQuery.java:126) > at > org.apache.drill.exec.physical.impl.partitionsender.TestPartitionSender.testPartitionSenderCostToThreads(TestPartitionSender.java:154) > > Running org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort > Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0 sec - in > org.apache.drill.exec.physical.impl.xsort.TestSimpleExternalSort > Running org.apache.drill.exec.physical.impl.svremover.TestSVRemover > Running > org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSelectionVectorRemoval > blue red green > 2147483647 9223372036854775807 2147483647 > 2147483647 9223372036854775807 2147483647 > > <TRIMMED> > > 2147483647 9223372036854775807 2147483647 > Total rows returned : 50. Returned in 181ms. > Running > org.apache.drill.exec.physical.impl.svremover.TestSVRemover#testSVRWithNoFilter > blue red green > true -9223372036854775808 -2147483648 > false 9223372036854775807 null > > <TRIMMED> > > true -9223372036854775808 -2147483648 > false 9223372036854775807 null > Total rows returned : 100. Returned in 54ms. > C-c C-c > real 708m38.962s > user 11m3.332s > sys 1m8.068s > [130]dbarclay@dev-linux2 ~/work/git/incubator-drill $ > > > > > > Daniel > -- > Daniel Barclay > MapR Technologies >
