But the status thread is a daemon. So the Drillbit doesn't have to stop it, right?
- Sudheesh > On Nov 6, 2015, at 6:44 PM, Jacques Nadeau <[email protected]> wrote: > > I see that we're bleeding Workmanager Status threads that aren't shutdown > when the Drillbit is shutdown. > > I'll get a patch together. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > >> On Fri, Nov 6, 2015 at 4:31 PM, Hanifi Gunes <[email protected]> wrote: >> >> Looks like we are possibly leaking some threads. Investigating. >> >>> On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <[email protected]> wrote: >>> >>> Hmm.. that is quite strange. I wonder if we need to look at thread counts >>> on the daemon. >>> >>> We haven't changed how we create but there were changes to shutdown >>> (although I can't imagine why that would be a problem). >>> >>> -- >>> Jacques Nadeau >>> CTO and Co-Founder, Dremio >>> >>>> On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <[email protected]> >>> wrote: >>> >>>> Not the testAggregateWithEmptyRequiredInput but I got the following on >>>> my branch rebased top of master -- @CentOS. >>>> >>>> Tests in error: >>>> TestImpersonationQueries.sequenceFileChainedImpersonationWithView » >>>> UserRemote >> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery. >>>> updateClient:236->BaseTestQuery.updateClient:213 » Rpc >> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222- >>>>> BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 » >>>> IllegalState >> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236- >>>>> BaseTestQuery.updateClient:213 » IllegalState >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >>>> >>>> exception details ---> >> testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) >>>> Time elapsed: 0.008 sec <<< ERROR! >>>> java.lang.IllegalStateException: failed to create a child event loop >>>> at sun.nio.ch.IOUtil.makePipe(Native Method) >>>> at >>> io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) >>>> at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120) >>>> at >> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) >>>> at >> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64) >>>> at >> io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49) >>>> at >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61) >>>> at >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52) >>>> at >> org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74) >>>> at >> org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) >>>> at >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) >>>> at >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) >>>> at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67) >>>> at >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213) >>>> at >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236) >>>> >>>> >>>> My god's telling me that we are creating too many NioEventLoopGroup's. >>>> Did we make any recent changes around RPC causing this? >>>> >>>> -Hanifi >>>> >>>> >>>>> On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <[email protected]> >>>> wrote: >>>> >>>>> Do you have that other output/stack trace I asked about? If we can >> also >>>> see >>>>> the illegalreference count on something other than the JDBC client >>> close >>>>> method, that would be helpful. >>>>> >>>>> -- >>>>> Jacques Nadeau >>>>> CTO and Co-Founder, Dremio >>>>> >>>>>> On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <[email protected]> >>>>> wrote: >>>>> >>>>>> I just re-run, and the previous 4 failures are gone. But it failed >>>>>> with two new ones: >>>>>> >>>>>> Tests in error: >> TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization >>>>>> » UserRemote >> TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization >>>>>> » UserRemote >>>>>> >>>>>> I re-start the machine, and there are not too many applications >>>>>> running and the memory should be enough. At least some days back, >> I >>>>>> got clean run on the same machine. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau <[email protected] >>> >>>>> wrote: >>>>>>> Can you provide the complete output for this failure: >>>>>>> >>>>>>> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » >>>>>>> IllegalReferenceCount >>>>>>> >>>>>>> I haven't seen the other issues. The last one looks like the >> system >>>> was >>>>>>> having an issue since thread creation failure is usually an OS >>>> problem. >>>>>> Was >>>>>>> your system under resourced? >>>>>>> >>>>>>> -- >>>>>>> Jacques Nadeau >>>>>>> CTO and Co-Founder, Dremio >>>>>>> >>>>>>> On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni < >> [email protected] >>>> >>>>>> wrote: >>>>>>> >>>>>>>> I'm seeing unit test case failure when run "mvn clean install" >>> over >>>>>>>> drill master branch, on Mac. >>>>>>>> >>>>>>>> The first one seems to be the issue #3 in Jacques's list. The >> last >>>>>>>> three seems to different from the 4 issues. Has anyone seen this >>>>>>>> failure before, or it just happened to my mac? Thanks. >>>>>>>> >>>>>>>> >>>>>>>> ================================================= >>>>>>>> git log >>>>>>>> commit 1a24233475ca46aaf2a49a5624b4042f088382f4 >>>>>>>> >>>>>>>> >>>>>>>> Tests in error: >> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » >>>>>>>> IllegalReferenceCount >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops >>>>>>>> » UserRemote >> TestImpersonationQueries.removeMiniDfsBasedStorage:294->BaseTestImpersonation.stopMiniDfsCluster:151 >>>>>>>> » OutOfMemory >>>>>>>> TestImpersonationQueries>BaseTestQuery.closeClient:260 » >>>> OutOfMemory >>>>>>>> unable to... >>>>>>>> >>>>>>>> Tests run: 1483, Failures: 0, Errors: 4, Skipped: 118 >>>>>>>> >>>>>>>> [INFO] >>> ------------------------------------------------------------------------ >>>>>>>> [INFO] Reactor Summary: >>>>>>>> [INFO] >>>>>>>> [INFO] Apache Drill Root POM .............................. >>> SUCCESS >>>> [ >>>>>>>> 8.440 s] >>>>>>>> [INFO] tools/Parent Pom ................................... >>> SUCCESS >>>> [ >>>>>>>> 0.631 s] >>>>>>>> [INFO] tools/freemarker codegen tooling ................... >>> SUCCESS >>>> [ >>>>>>>> 5.236 s] >>>>>>>> [INFO] Drill Protocol ..................................... >>> SUCCESS >>>> [ >>>>>>>> 5.839 s] >>>>>>>> [INFO] Common (Logical Plan, Base expressions) ............ >>> SUCCESS >>>> [ >>>>>>>> 10.831 s] >>>>>>>> [INFO] contrib/Parent Pom ................................. >>> SUCCESS >>>> [ >>>>>>>> 0.815 s] >>>>>>>> [INFO] contrib/data/Parent Pom ............................ >>> SUCCESS >>>> [ >>>>>>>> 0.331 s] >>>>>>>> [INFO] contrib/data/tpch-sample-data ...................... >>> SUCCESS >>>> [ >>>>>>>> 2.838 s] >>>>>>>> [INFO] exec/Parent Pom .................................... >>> SUCCESS >>>> [ >>>>>>>> 0.635 s] >>>>>>>> [INFO] exec/Java Execution Engine ......................... >>> FAILURE >>>>>> [12:05 >>>>>>>> min] >>>>>>>> [INFO] exec/JDBC Driver using dependencies ................ >>> SKIPPED >>>>>>>> [INFO] JDBC JAR with all dependencies ..................... >>> SKIPPED >>>>>>>> [INFO] contrib/mongo-storage-plugin ....................... >>> SKIPPED >>>>>>>> >>>>>>>> Tests run: 11, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: >>>>>>>> 17.042 sec <<< FAILURE! - in >>>>>>>> org.apache.drill.exec.impersonation.TestImpersonationQueries >> testMultiLevelImpersonationEqualToMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) >>>>>>>> Time elapsed: 0.099 sec <<< ERROR! >>>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM >>>> ERROR: >>>>>>>> OutOfMemoryError: unable to create new native thread >>>>>>>> >>>>>>>> >>>>>>>> [Error Id: a826ac5d-e278-49bc-8f92-fdf241d0e634 on >>>> 10.250.50.52:31010 >>>>> ] >>>>>>>> at >> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) >>>>>>>> at >> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112) >>>>>>>> at >> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) >>>>>>>> at >> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) >>>>>>>> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:68) >>>>>>>> at >>>>> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390) >>>>>>>> at >> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) >>>>>>>> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>> >>>>>>>> On Fri, Nov 6, 2015 at 9:42 AM, Jacques Nadeau < >>> [email protected]> >>>>>> wrote: >>>>>>>>> It seems like we have four potentially show stopping issues at >>> the >>>>>>>> moment: >>>>>>>>> >>>>>>>>> DRILL-4042: Windows build doesn't include right version of >>> Hadoop >>>>>>>>> dependencies >>>>>>>>> DRILL-3480: Random message propagation timeouts >>>>>>>>> DRILL-4041: Reference count issue >>>>>>>>> DRILL-4046: Performance regression for some TPCH queries >>>>>>>>> >>>>>>>>> Proposed next steps: >>>>>>>>> >>>>>>>>> DRILL-4042 has a clear fix and reproduction. Patrick, do you >>> think >>>>> can >>>>>>>> have >>>>>>>>> a fix up for this shortly? >>>>>>>>> >>>>>>>>> For the 3480 & 4041, consistent reproductions are missing. It >>>> would >>>>> be >>>>>>>>> great if everybody could try to help find reproductions to >> these >>>>>> issues. >>>>>>>> I >>>>>>>>> think we should take stock again at the end of the day to >> decide >>>>> next >>>>>>>> steps >>>>>>>>> and whether we want to hold the release for these. >>>>>>>>> >>>>>>>>> For 4046: I've heard that there are some performance >> regressions >>>>>> around a >>>>>>>>> couple of queries but the current symptoms don't make a lot of >>>>> sense. >>>>>> I'd >>>>>>>>> like to collect some more data here and then decide next >> steps. >>>>>>>>> >>>>>>>>> Let's see if we can get repros for each of the inconsistent >>> issues >>>>> and >>>>>>>>> check in again EOD. >>>>>>>>> >>>>>>>>> thanks, >>>>>>>>> Jacques >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jacques Nadeau >>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>> >>>>>>>>> On Thu, Nov 5, 2015 at 3:36 PM, Aditya < >> [email protected] >>>> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ran into another one - DRILL-4042 >>>>>>>>>> <https://issues.apache.org/jira/browse/DRILL-4042>. >>>>>>>>>> >>>>>>>>>> On Thu, Nov 5, 2015 at 1:48 PM, Jacques Nadeau < >>>> [email protected] >>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Yeah, I think that sinks it. Weird how Rat complains only on >>>>>> windows... >>>>>>>>>>> >>>>>>>>>>> Let's take the rest of the business day to test the current >>>>>> candidate >>>>>>>> to >>>>>>>>>>> make sure that we don't spin extra builds unnecessarily. >>>>>>>>>>> >>>>>>>>>>> thanks, >>>>>>>>>>> Jacques >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Jacques Nadeau >>>>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>>>> >>>>>>>>>>> On Thu, Nov 5, 2015 at 1:24 PM, Aditya < >>> [email protected] >>>>> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Oh, I thought only master/trunk branch was protected, but >>> now I >>>>> see >>>>>>>> the >>>>>>>>>>>> mail from David Nalley. >>>>>>>>>>>> >>>>>>>>>>>> In such case, I propose that the release manager could push >>> the >>>>>> branch >>>>>>>>>>>> to his/her private fork and put the URL/hash in the vote >>>> starter >>>>>>>> thread. >>>>>>>>>>>> >>>>>>>>>>>> The reason I was looking to the commit history to determine >>> if >>>>> the >>>>>>>>>>>> candidate suffer from DRILL-4040, which, evidently it does. >>>>>>>>>>>> >>>>>>>>>>>> -1 as the build from source is failing. >>>>>>>>>>>> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/DRILL-4040 >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:12 PM, Jacques Nadeau < >>>>> [email protected] >>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I'm not sure what to do here. INFRA just changed the Git >>>>> behavior >>>>>> so >>>>>>>> it >>>>>>>>>>>>> is no longer possible to delete branches. I generally >> don't >>>> like >>>>>> to >>>>>>>> have >>>>>>>>>>>>> failed branches in a release history (otherwise you get a >>>>> release >>>>>>>> branch >>>>>>>>>>>>> with all these maven forward/backwards commits). As such, >> I >>>>> would >>>>>>>> overwrite >>>>>>>>>>>>> candidate branches historically (dropping the failed >> release >>>>>>>> commits). >>>>>>>>>>>>> >>>>>>>>>>>>> The commit is here right now: >>>>>>>>>>>>> https://github.com/jacques-n/drill/tree/drill-1.3.0-rc0 >>>>>>>>>>>>> >>>>>>>>>>>>> The parent of 4822068a006aeb251b686d2b51871573c4337e60 >>>>>>>>>>>>> is >>>>>>>>>>>>> 3dedc158f3af8ec8320a9cd336b2798b09cc9a8d (the tip of >> master) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jacques Nadeau >>>>>>>>>>>>> CTO and Co-Founder, Dremio >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:01 PM, Aditya < >>>> [email protected] >>>>>> >>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I am having trouble determining the git commit this >> release >>>> is >>>>>> based >>>>>>>>>>>>>> on as >>>>>>>>>>>>>> I could not find the >>>>>>>>>>>>>> id (4822068a006aeb251b686d2b51871573c4337e60) captured in >>> the >>>>>>>>>>>>>> git.properties bundled in the >>>>>>>>>>>>>> tarballs in the Drill Git repository. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most likely the last commit is only in your local branch >>> and >>>>>> since >>>>>>>>>>>>>> git.properties captures only the >>>>>>>>>>>>>> last commit, it is impossible to find the parent commit. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Would it make sense to push the release branch? >>>>>>>>>>>>>> >>>>>>>>>>>>>> aditya... >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Nov 4, 2015 at 11:08 PM, Jacques Nadeau < >>>>>> [email protected] >>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Everybody, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm happy to propose a new release of Apache Drill, >>> version >>>>>> 1.3.0. >>>>>>>>>>>>>> This is >>>>>>>>>>>>>>> the first release candidate (rc0). It covers a total >> of >>>> ~50 >>>>>>>> closed >>>>>>>>>>>>>> JIRAs >>>>>>>>>>>>>>> [1]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The tarball artifacts are hosted at [2] and the maven >>>>> artifacts >>>>>>>> are >>>>>>>>>>>>>> hosted >>>>>>>>>>>>>>> at [3]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The vote will be open for 72 hours ending at 11PM >>> Pacific, >>>>>>>> November >>>>>>>>>>>>>> 7, >>>>>>>>>>>>>>> 2015. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [ ] +1 >>>>>>>>>>>>>>> [ ] +0 >>>>>>>>>>>>>>> [ ] -1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> thanks, >>>>>>>>>>>>>>> Jacques >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946 >>>>>>>>>>>>>>> [2] >>>>> http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/ >>>>>>>>>>>>>>> [3] >>> https://repository.apache.org/content/repositories/orgapachedrill-1013/ >>
