Ok, display issue. I lied. It looks like this is constrained to the status thread. However, we're probably creating several hundred over the course of the tests (since we don't restart the jvm).
-- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Nov 6, 2015 at 6:56 PM, Sudheesh Katkam <[email protected]> wrote: > But the status thread is a daemon. So the Drillbit doesn't have to stop > it, right? > > - Sudheesh > > > On Nov 6, 2015, at 6:44 PM, Jacques Nadeau <[email protected]> wrote: > > > > I see that we're bleeding Workmanager Status threads that aren't shutdown > > when the Drillbit is shutdown. > > > > I'll get a patch together. > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > >> On Fri, Nov 6, 2015 at 4:31 PM, Hanifi Gunes <[email protected]> > wrote: > >> > >> Looks like we are possibly leaking some threads. Investigating. > >> > >>> On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <[email protected]> > wrote: > >>> > >>> Hmm.. that is quite strange. I wonder if we need to look at thread > counts > >>> on the daemon. > >>> > >>> We haven't changed how we create but there were changes to shutdown > >>> (although I can't imagine why that would be a problem). > >>> > >>> -- > >>> Jacques Nadeau > >>> CTO and Co-Founder, Dremio > >>> > >>>> On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <[email protected]> > >>> wrote: > >>> > >>>> Not the testAggregateWithEmptyRequiredInput but I got the following on > >>>> my branch rebased top of master -- @CentOS. > >>>> > >>>> Tests in error: > >>>> TestImpersonationQueries.sequenceFileChainedImpersonationWithView » > >>>> UserRemote > >> > TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery. > >>>> updateClient:236->BaseTestQuery.updateClient:213 » Rpc > >> > TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: > >>>> 236->BaseTestQuery.updateClient:213 » IllegalState > >> > TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222- > >>>>> BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 » > >>>> IllegalState > >> > TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: > >>>> 236->BaseTestQuery.updateClient:213 » IllegalState > >> > TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236- > >>>>> BaseTestQuery.updateClient:213 » IllegalState > >> > TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: > >>>> 236->BaseTestQuery.updateClient:213 » IllegalState > >>>> > >>>> exception details ---> > >> > testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) > >>>> Time elapsed: 0.008 sec <<< ERROR! > >>>> java.lang.IllegalStateException: failed to create a child event loop > >>>> at sun.nio.ch.IOUtil.makePipe(Native Method) > >>>> at > >>> io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) > >>>> at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120) > >>>> at > >> > io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) > >>>> at > >> > io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64) > >>>> at > >> > io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49) > >>>> at > >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61) > >>>> at > >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52) > >>>> at > >> > org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74) > >>>> at > >> > org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) > >>>> at > >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) > >>>> at > >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) > >>>> at org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67) > >>>> at > >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213) > >>>> at > >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236) > >>>> > >>>> > >>>> My god's telling me that we are creating too many NioEventLoopGroup's. > >>>> Did we make any recent changes around RPC causing this? > >>>> > >>>> -Hanifi > >>>> > >>>> > >>>>> On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <[email protected]> > >>>> wrote: > >>>> > >>>>> Do you have that other output/stack trace I asked about? If we can > >> also > >>>> see > >>>>> the illegalreference count on something other than the JDBC client > >>> close > >>>>> method, that would be helpful. > >>>>> > >>>>> -- > >>>>> Jacques Nadeau > >>>>> CTO and Co-Founder, Dremio > >>>>> > >>>>>> On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> I just re-run, and the previous 4 failures are gone. But it failed > >>>>>> with two new ones: > >>>>>> > >>>>>> Tests in error: > >> > TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization > >>>>>> » UserRemote > >> > TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization > >>>>>> » UserRemote > >>>>>> > >>>>>> I re-start the machine, and there are not too many applications > >>>>>> running and the memory should be enough. At least some days back, > >> I > >>>>>> got clean run on the same machine. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau <[email protected] > >>> > >>>>> wrote: > >>>>>>> Can you provide the complete output for this failure: > >>>>>>> > >>>>>>> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » > >>>>>>> IllegalReferenceCount > >>>>>>> > >>>>>>> I haven't seen the other issues. The last one looks like the > >> system > >>>> was > >>>>>>> having an issue since thread creation failure is usually an OS > >>>> problem. > >>>>>> Was > >>>>>>> your system under resourced? > >>>>>>> > >>>>>>> -- > >>>>>>> Jacques Nadeau > >>>>>>> CTO and Co-Founder, Dremio > >>>>>>> > >>>>>>> On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni < > >> [email protected] > >>>> > >>>>>> wrote: > >>>>>>> > >>>>>>>> I'm seeing unit test case failure when run "mvn clean install" > >>> over > >>>>>>>> drill master branch, on Mac. > >>>>>>>> > >>>>>>>> The first one seems to be the issue #3 in Jacques's list. The > >> last > >>>>>>>> three seems to different from the 4 issues. Has anyone seen this > >>>>>>>> failure before, or it just happened to my mac? Thanks. > >>>>>>>> > >>>>>>>> > >>>>>>>> ================================================= > >>>>>>>> git log > >>>>>>>> commit 1a24233475ca46aaf2a49a5624b4042f088382f4 > >>>>>>>> > >>>>>>>> > >>>>>>>> Tests in error: > >> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » > >>>>>>>> IllegalReferenceCount > >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops > >>>>>>>> » UserRemote > >> > TestImpersonationQueries.removeMiniDfsBasedStorage:294->BaseTestImpersonation.stopMiniDfsCluster:151 > >>>>>>>> » OutOfMemory > >>>>>>>> TestImpersonationQueries>BaseTestQuery.closeClient:260 » > >>>> OutOfMemory > >>>>>>>> unable to... > >>>>>>>> > >>>>>>>> Tests run: 1483, Failures: 0, Errors: 4, Skipped: 118 > >>>>>>>> > >>>>>>>> [INFO] > >>> > ------------------------------------------------------------------------ > >>>>>>>> [INFO] Reactor Summary: > >>>>>>>> [INFO] > >>>>>>>> [INFO] Apache Drill Root POM .............................. > >>> SUCCESS > >>>> [ > >>>>>>>> 8.440 s] > >>>>>>>> [INFO] tools/Parent Pom ................................... > >>> SUCCESS > >>>> [ > >>>>>>>> 0.631 s] > >>>>>>>> [INFO] tools/freemarker codegen tooling ................... > >>> SUCCESS > >>>> [ > >>>>>>>> 5.236 s] > >>>>>>>> [INFO] Drill Protocol ..................................... > >>> SUCCESS > >>>> [ > >>>>>>>> 5.839 s] > >>>>>>>> [INFO] Common (Logical Plan, Base expressions) ............ > >>> SUCCESS > >>>> [ > >>>>>>>> 10.831 s] > >>>>>>>> [INFO] contrib/Parent Pom ................................. > >>> SUCCESS > >>>> [ > >>>>>>>> 0.815 s] > >>>>>>>> [INFO] contrib/data/Parent Pom ............................ > >>> SUCCESS > >>>> [ > >>>>>>>> 0.331 s] > >>>>>>>> [INFO] contrib/data/tpch-sample-data ...................... > >>> SUCCESS > >>>> [ > >>>>>>>> 2.838 s] > >>>>>>>> [INFO] exec/Parent Pom .................................... > >>> SUCCESS > >>>> [ > >>>>>>>> 0.635 s] > >>>>>>>> [INFO] exec/Java Execution Engine ......................... > >>> FAILURE > >>>>>> [12:05 > >>>>>>>> min] > >>>>>>>> [INFO] exec/JDBC Driver using dependencies ................ > >>> SKIPPED > >>>>>>>> [INFO] JDBC JAR with all dependencies ..................... > >>> SKIPPED > >>>>>>>> [INFO] contrib/mongo-storage-plugin ....................... > >>> SKIPPED > >>>>>>>> > >>>>>>>> Tests run: 11, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > >>>>>>>> 17.042 sec <<< FAILURE! - in > >>>>>>>> org.apache.drill.exec.impersonation.TestImpersonationQueries > >> > testMultiLevelImpersonationEqualToMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) > >>>>>>>> Time elapsed: 0.099 sec <<< ERROR! > >>>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > >>>> ERROR: > >>>>>>>> OutOfMemoryError: unable to create new native thread > >>>>>>>> > >>>>>>>> > >>>>>>>> [Error Id: a826ac5d-e278-49bc-8f92-fdf241d0e634 on > >>>> 10.250.50.52:31010 > >>>>> ] > >>>>>>>> at > >> > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) > >>>>>>>> at > >> > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112) > >>>>>>>> at > >> > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) > >>>>>>>> at > >> > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) > >>>>>>>> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:68) > >>>>>>>> at > >>>>> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390) > >>>>>>>> at > >> > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) > >>>>>>>> at > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>>>>>>> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>>>>>>> at java.lang.Thread.run(Thread.java:744) > >>>>>>>> > >>>>>>>> On Fri, Nov 6, 2015 at 9:42 AM, Jacques Nadeau < > >>> [email protected]> > >>>>>> wrote: > >>>>>>>>> It seems like we have four potentially show stopping issues at > >>> the > >>>>>>>> moment: > >>>>>>>>> > >>>>>>>>> DRILL-4042: Windows build doesn't include right version of > >>> Hadoop > >>>>>>>>> dependencies > >>>>>>>>> DRILL-3480: Random message propagation timeouts > >>>>>>>>> DRILL-4041: Reference count issue > >>>>>>>>> DRILL-4046: Performance regression for some TPCH queries > >>>>>>>>> > >>>>>>>>> Proposed next steps: > >>>>>>>>> > >>>>>>>>> DRILL-4042 has a clear fix and reproduction. Patrick, do you > >>> think > >>>>> can > >>>>>>>> have > >>>>>>>>> a fix up for this shortly? > >>>>>>>>> > >>>>>>>>> For the 3480 & 4041, consistent reproductions are missing. It > >>>> would > >>>>> be > >>>>>>>>> great if everybody could try to help find reproductions to > >> these > >>>>>> issues. > >>>>>>>> I > >>>>>>>>> think we should take stock again at the end of the day to > >> decide > >>>>> next > >>>>>>>> steps > >>>>>>>>> and whether we want to hold the release for these. > >>>>>>>>> > >>>>>>>>> For 4046: I've heard that there are some performance > >> regressions > >>>>>> around a > >>>>>>>>> couple of queries but the current symptoms don't make a lot of > >>>>> sense. > >>>>>> I'd > >>>>>>>>> like to collect some more data here and then decide next > >> steps. > >>>>>>>>> > >>>>>>>>> Let's see if we can get repros for each of the inconsistent > >>> issues > >>>>> and > >>>>>>>>> check in again EOD. > >>>>>>>>> > >>>>>>>>> thanks, > >>>>>>>>> Jacques > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Jacques Nadeau > >>>>>>>>> CTO and Co-Founder, Dremio > >>>>>>>>> > >>>>>>>>> On Thu, Nov 5, 2015 at 3:36 PM, Aditya < > >> [email protected] > >>>> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Ran into another one - DRILL-4042 > >>>>>>>>>> <https://issues.apache.org/jira/browse/DRILL-4042>. > >>>>>>>>>> > >>>>>>>>>> On Thu, Nov 5, 2015 at 1:48 PM, Jacques Nadeau < > >>>> [email protected] > >>>>>> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Yeah, I think that sinks it. Weird how Rat complains only on > >>>>>> windows... > >>>>>>>>>>> > >>>>>>>>>>> Let's take the rest of the business day to test the current > >>>>>> candidate > >>>>>>>> to > >>>>>>>>>>> make sure that we don't spin extra builds unnecessarily. > >>>>>>>>>>> > >>>>>>>>>>> thanks, > >>>>>>>>>>> Jacques > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Jacques Nadeau > >>>>>>>>>>> CTO and Co-Founder, Dremio > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Nov 5, 2015 at 1:24 PM, Aditya < > >>> [email protected] > >>>>> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Oh, I thought only master/trunk branch was protected, but > >>> now I > >>>>> see > >>>>>>>> the > >>>>>>>>>>>> mail from David Nalley. > >>>>>>>>>>>> > >>>>>>>>>>>> In such case, I propose that the release manager could push > >>> the > >>>>>> branch > >>>>>>>>>>>> to his/her private fork and put the URL/hash in the vote > >>>> starter > >>>>>>>> thread. > >>>>>>>>>>>> > >>>>>>>>>>>> The reason I was looking to the commit history to determine > >>> if > >>>>> the > >>>>>>>>>>>> candidate suffer from DRILL-4040, which, evidently it does. > >>>>>>>>>>>> > >>>>>>>>>>>> -1 as the build from source is failing. > >>>>>>>>>>>> > >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/DRILL-4040 > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:12 PM, Jacques Nadeau < > >>>>> [email protected] > >>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> I'm not sure what to do here. INFRA just changed the Git > >>>>> behavior > >>>>>> so > >>>>>>>> it > >>>>>>>>>>>>> is no longer possible to delete branches. I generally > >> don't > >>>> like > >>>>>> to > >>>>>>>> have > >>>>>>>>>>>>> failed branches in a release history (otherwise you get a > >>>>> release > >>>>>>>> branch > >>>>>>>>>>>>> with all these maven forward/backwards commits). As such, > >> I > >>>>> would > >>>>>>>> overwrite > >>>>>>>>>>>>> candidate branches historically (dropping the failed > >> release > >>>>>>>> commits). > >>>>>>>>>>>>> > >>>>>>>>>>>>> The commit is here right now: > >>>>>>>>>>>>> https://github.com/jacques-n/drill/tree/drill-1.3.0-rc0 > >>>>>>>>>>>>> > >>>>>>>>>>>>> The parent of 4822068a006aeb251b686d2b51871573c4337e60 > >>>>>>>>>>>>> is > >>>>>>>>>>>>> 3dedc158f3af8ec8320a9cd336b2798b09cc9a8d (the tip of > >> master) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Jacques Nadeau > >>>>>>>>>>>>> CTO and Co-Founder, Dremio > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:01 PM, Aditya < > >>>> [email protected] > >>>>>> > >>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I am having trouble determining the git commit this > >> release > >>>> is > >>>>>> based > >>>>>>>>>>>>>> on as > >>>>>>>>>>>>>> I could not find the > >>>>>>>>>>>>>> id (4822068a006aeb251b686d2b51871573c4337e60) captured in > >>> the > >>>>>>>>>>>>>> git.properties bundled in the > >>>>>>>>>>>>>> tarballs in the Drill Git repository. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Most likely the last commit is only in your local branch > >>> and > >>>>>> since > >>>>>>>>>>>>>> git.properties captures only the > >>>>>>>>>>>>>> last commit, it is impossible to find the parent commit. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Would it make sense to push the release branch? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> aditya... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Nov 4, 2015 at 11:08 PM, Jacques Nadeau < > >>>>>> [email protected] > >>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hey Everybody, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm happy to propose a new release of Apache Drill, > >>> version > >>>>>> 1.3.0. > >>>>>>>>>>>>>> This is > >>>>>>>>>>>>>>> the first release candidate (rc0). It covers a total > >> of > >>>> ~50 > >>>>>>>> closed > >>>>>>>>>>>>>> JIRAs > >>>>>>>>>>>>>>> [1]. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The tarball artifacts are hosted at [2] and the maven > >>>>> artifacts > >>>>>>>> are > >>>>>>>>>>>>>> hosted > >>>>>>>>>>>>>>> at [3]. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The vote will be open for 72 hours ending at 11PM > >>> Pacific, > >>>>>>>> November > >>>>>>>>>>>>>> 7, > >>>>>>>>>>>>>>> 2015. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [ ] +1 > >>>>>>>>>>>>>>> [ ] +0 > >>>>>>>>>>>>>>> [ ] -1 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> thanks, > >>>>>>>>>>>>>>> Jacques > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1] > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946 > >>>>>>>>>>>>>>> [2] > >>>>> http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/ > >>>>>>>>>>>>>>> [3] > >>> > https://repository.apache.org/content/repositories/orgapachedrill-1013/ > >> >
