Patch branch here: https://github.com/jacques-n/drill/tree/merge_2015_11_06
Here is where we stand: Completed (nearly) DRILL-4042: Windows build doesn't include right version of Hadoop dependencies DRILL-4046: Performance regression for some TPCH queries DRILL-4048: Parquet corruption issue DRILL-4049: Avoid excessive threads in tests No resolution DRILL-3480: Random message propagation timeouts DRILL-4041: Reference count issue I've also added two new assertions to the branch above to see if that better reveals the error behind DRILL-4041. It would be great if everybody could retry the branch to see any issues before we start another vote. (Or better information on 4041) thanks, Jacques -- Jacques Nadeau CTO and Co-Founder, Dremio On Fri, Nov 6, 2015 at 6:59 PM, Jacques Nadeau <[email protected]> wrote: > Ok, display issue. I lied. > > It looks like this is constrained to the status thread. However, we're > probably creating several hundred over the course of the tests (since we > don't restart the jvm). > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Fri, Nov 6, 2015 at 6:56 PM, Sudheesh Katkam <[email protected]> > wrote: > >> But the status thread is a daemon. So the Drillbit doesn't have to stop >> it, right? >> >> - Sudheesh >> >> > On Nov 6, 2015, at 6:44 PM, Jacques Nadeau <[email protected]> wrote: >> > >> > I see that we're bleeding Workmanager Status threads that aren't >> shutdown >> > when the Drillbit is shutdown. >> > >> > I'll get a patch together. >> > >> > -- >> > Jacques Nadeau >> > CTO and Co-Founder, Dremio >> > >> >> On Fri, Nov 6, 2015 at 4:31 PM, Hanifi Gunes <[email protected]> >> wrote: >> >> >> >> Looks like we are possibly leaking some threads. Investigating. >> >> >> >>> On Fri, Nov 6, 2015 at 4:25 PM, Jacques Nadeau <[email protected]> >> wrote: >> >>> >> >>> Hmm.. that is quite strange. I wonder if we need to look at thread >> counts >> >>> on the daemon. >> >>> >> >>> We haven't changed how we create but there were changes to shutdown >> >>> (although I can't imagine why that would be a problem). >> >>> >> >>> -- >> >>> Jacques Nadeau >> >>> CTO and Co-Founder, Dremio >> >>> >> >>>> On Fri, Nov 6, 2015 at 4:11 PM, Hanifi Gunes <[email protected]> >> >>> wrote: >> >>> >> >>>> Not the testAggregateWithEmptyRequiredInput but I got the following >> on >> >>>> my branch rebased top of master -- @CentOS. >> >>>> >> >>>> Tests in error: >> >>>> TestImpersonationQueries.sequenceFileChainedImpersonationWithView » >> >>>> UserRemote >> >> >> TestImpersonationQueries.testMultiLevelImpersonationJoinEachSideReachesMaxUserHops:233->BaseTestQuery.updateClient:222->BaseTestQuery. >> >>>> updateClient:236->BaseTestQuery.updateClient:213 » Rpc >> >> >> TestImpersonationQueries.testMultiLevelImpersonationExceedsMaxUserHops:219->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >> >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >> >> >> TestImpersonationQueries.avroChainedImpersonationWithView:280->BaseTestImpersonation.createView:186->BaseTestQuery.updateClient:222- >> >>>>> BaseTestQuery.updateClient:236->BaseTestQuery.updateClient:213 » >> >>>> IllegalState >> >> >> TestImpersonationQueries.testDirectImpersonation_HasGroupReadPermissions:186->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >> >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >> >> >> TestImpersonationQueries.testDirectImpersonation_NoReadPermissions:196->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient:236- >> >>>>> BaseTestQuery.updateClient:213 » IllegalState >> >> >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops:210->BaseTestQuery.updateClient:222->BaseTestQuery.updateClient: >> >>>> 236->BaseTestQuery.updateClient:213 » IllegalState >> >>>> >> >>>> exception details ---> >> >> >> testMultiLevelImpersonationExceedsMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) >> >>>> Time elapsed: 0.008 sec <<< ERROR! >> >>>> java.lang.IllegalStateException: failed to create a child event loop >> >>>> at sun.nio.ch.IOUtil.makePipe(Native Method) >> >>>> at >> >>> io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:126) >> >>>> at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:120) >> >>>> at >> >> >> io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87) >> >>>> at >> >> >> io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:64) >> >>>> at >> >> >> io.netty.channel.MultithreadEventLoopGroup.<init>(MultithreadEventLoopGroup.java:49) >> >>>> at >> >> >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:61) >> >>>> at >> >> >> io.netty.channel.nio.NioEventLoopGroup.<init>(NioEventLoopGroup.java:52) >> >>>> at >> >> >> org.apache.drill.exec.rpc.TransportCheck.createEventLoopGroup(TransportCheck.java:74) >> >>>> at >> >> >> org.apache.drill.exec.client.DrillClient.createEventLoop(DrillClient.java:239) >> >>>> at >> >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:220) >> >>>> at >> >>> org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:178) >> >>>> at >> org.apache.drill.QueryTestUtil.createClient(QueryTestUtil.java:67) >> >>>> at >> >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:213) >> >>>> at >> >> org.apache.drill.BaseTestQuery.updateClient(BaseTestQuery.java:236) >> >>>> >> >>>> >> >>>> My god's telling me that we are creating too many >> NioEventLoopGroup's. >> >>>> Did we make any recent changes around RPC causing this? >> >>>> >> >>>> -Hanifi >> >>>> >> >>>> >> >>>>> On Fri, Nov 6, 2015 at 3:58 PM, Jacques Nadeau <[email protected]> >> >>>> wrote: >> >>>> >> >>>>> Do you have that other output/stack trace I asked about? If we can >> >> also >> >>>> see >> >>>>> the illegalreference count on something other than the JDBC client >> >>> close >> >>>>> method, that would be helpful. >> >>>>> >> >>>>> -- >> >>>>> Jacques Nadeau >> >>>>> CTO and Co-Founder, Dremio >> >>>>> >> >>>>>> On Fri, Nov 6, 2015 at 2:48 PM, Jinfeng Ni <[email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> I just re-run, and the previous 4 failures are gone. But it failed >> >>>>>> with two new ones: >> >>>>>> >> >>>>>> Tests in error: >> >> >> TestSqlStdBasedAuthorization.org.apache.drill.exec.impersonation.hive.TestSqlStdBasedAuthorization >> >>>>>> » UserRemote >> >> >> TestStorageBasedHiveAuthorization.org.apache.drill.exec.impersonation.hive.TestStorageBasedHiveAuthorization >> >>>>>> » UserRemote >> >>>>>> >> >>>>>> I re-start the machine, and there are not too many applications >> >>>>>> running and the memory should be enough. At least some days back, >> >> I >> >>>>>> got clean run on the same machine. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Nov 6, 2015 at 2:39 PM, Jacques Nadeau <[email protected] >> >>> >> >>>>> wrote: >> >>>>>>> Can you provide the complete output for this failure: >> >>>>>>> >> >>>>>>> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » >> >>>>>>> IllegalReferenceCount >> >>>>>>> >> >>>>>>> I haven't seen the other issues. The last one looks like the >> >> system >> >>>> was >> >>>>>>> having an issue since thread creation failure is usually an OS >> >>>> problem. >> >>>>>> Was >> >>>>>>> your system under resourced? >> >>>>>>> >> >>>>>>> -- >> >>>>>>> Jacques Nadeau >> >>>>>>> CTO and Co-Founder, Dremio >> >>>>>>> >> >>>>>>> On Fri, Nov 6, 2015 at 12:55 PM, Jinfeng Ni < >> >> [email protected] >> >>>> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>>> I'm seeing unit test case failure when run "mvn clean install" >> >>> over >> >>>>>>>> drill master branch, on Mac. >> >>>>>>>> >> >>>>>>>> The first one seems to be the issue #3 in Jacques's list. The >> >> last >> >>>>>>>> three seems to different from the 4 issues. Has anyone seen this >> >>>>>>>> failure before, or it just happened to my mac? Thanks. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> ================================================= >> >>>>>>>> git log >> >>>>>>>> commit 1a24233475ca46aaf2a49a5624b4042f088382f4 >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Tests in error: >> >> TestAggregateFunctions.testAggregateWithEmptyRequiredInput:237 » >> >>>>>>>> IllegalReferenceCount >> >> TestImpersonationQueries.testMultiLevelImpersonationEqualToMaxUserHops >> >>>>>>>> » UserRemote >> >> >> TestImpersonationQueries.removeMiniDfsBasedStorage:294->BaseTestImpersonation.stopMiniDfsCluster:151 >> >>>>>>>> » OutOfMemory >> >>>>>>>> TestImpersonationQueries>BaseTestQuery.closeClient:260 » >> >>>> OutOfMemory >> >>>>>>>> unable to... >> >>>>>>>> >> >>>>>>>> Tests run: 1483, Failures: 0, Errors: 4, Skipped: 118 >> >>>>>>>> >> >>>>>>>> [INFO] >> >>> >> ------------------------------------------------------------------------ >> >>>>>>>> [INFO] Reactor Summary: >> >>>>>>>> [INFO] >> >>>>>>>> [INFO] Apache Drill Root POM .............................. >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 8.440 s] >> >>>>>>>> [INFO] tools/Parent Pom ................................... >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 0.631 s] >> >>>>>>>> [INFO] tools/freemarker codegen tooling ................... >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 5.236 s] >> >>>>>>>> [INFO] Drill Protocol ..................................... >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 5.839 s] >> >>>>>>>> [INFO] Common (Logical Plan, Base expressions) ............ >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 10.831 s] >> >>>>>>>> [INFO] contrib/Parent Pom ................................. >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 0.815 s] >> >>>>>>>> [INFO] contrib/data/Parent Pom ............................ >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 0.331 s] >> >>>>>>>> [INFO] contrib/data/tpch-sample-data ...................... >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 2.838 s] >> >>>>>>>> [INFO] exec/Parent Pom .................................... >> >>> SUCCESS >> >>>> [ >> >>>>>>>> 0.635 s] >> >>>>>>>> [INFO] exec/Java Execution Engine ......................... >> >>> FAILURE >> >>>>>> [12:05 >> >>>>>>>> min] >> >>>>>>>> [INFO] exec/JDBC Driver using dependencies ................ >> >>> SKIPPED >> >>>>>>>> [INFO] JDBC JAR with all dependencies ..................... >> >>> SKIPPED >> >>>>>>>> [INFO] contrib/mongo-storage-plugin ....................... >> >>> SKIPPED >> >>>>>>>> >> >>>>>>>> Tests run: 11, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: >> >>>>>>>> 17.042 sec <<< FAILURE! - in >> >>>>>>>> org.apache.drill.exec.impersonation.TestImpersonationQueries >> >> >> testMultiLevelImpersonationEqualToMaxUserHops(org.apache.drill.exec.impersonation.TestImpersonationQueries) >> >>>>>>>> Time elapsed: 0.099 sec <<< ERROR! >> >>>>>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM >> >>>> ERROR: >> >>>>>>>> OutOfMemoryError: unable to create new native thread >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> [Error Id: a826ac5d-e278-49bc-8f92-fdf241d0e634 on >> >>>> 10.250.50.52:31010 >> >>>>> ] >> >>>>>>>> at >> >> >> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) >> >>>>>>>> at >> >> >> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:112) >> >>>>>>>> at >> >> >> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) >> >>>>>>>> at >> >> >> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) >> >>>>>>>> at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:68) >> >>>>>>>> at >> >>>>> org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:390) >> >>>>>>>> at >> >> >> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105) >> >>>>>>>> at >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> >>>>>>>> at >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> >>>>>>>> at java.lang.Thread.run(Thread.java:744) >> >>>>>>>> >> >>>>>>>> On Fri, Nov 6, 2015 at 9:42 AM, Jacques Nadeau < >> >>> [email protected]> >> >>>>>> wrote: >> >>>>>>>>> It seems like we have four potentially show stopping issues at >> >>> the >> >>>>>>>> moment: >> >>>>>>>>> >> >>>>>>>>> DRILL-4042: Windows build doesn't include right version of >> >>> Hadoop >> >>>>>>>>> dependencies >> >>>>>>>>> DRILL-3480: Random message propagation timeouts >> >>>>>>>>> DRILL-4041: Reference count issue >> >>>>>>>>> DRILL-4046: Performance regression for some TPCH queries >> >>>>>>>>> >> >>>>>>>>> Proposed next steps: >> >>>>>>>>> >> >>>>>>>>> DRILL-4042 has a clear fix and reproduction. Patrick, do you >> >>> think >> >>>>> can >> >>>>>>>> have >> >>>>>>>>> a fix up for this shortly? >> >>>>>>>>> >> >>>>>>>>> For the 3480 & 4041, consistent reproductions are missing. It >> >>>> would >> >>>>> be >> >>>>>>>>> great if everybody could try to help find reproductions to >> >> these >> >>>>>> issues. >> >>>>>>>> I >> >>>>>>>>> think we should take stock again at the end of the day to >> >> decide >> >>>>> next >> >>>>>>>> steps >> >>>>>>>>> and whether we want to hold the release for these. >> >>>>>>>>> >> >>>>>>>>> For 4046: I've heard that there are some performance >> >> regressions >> >>>>>> around a >> >>>>>>>>> couple of queries but the current symptoms don't make a lot of >> >>>>> sense. >> >>>>>> I'd >> >>>>>>>>> like to collect some more data here and then decide next >> >> steps. >> >>>>>>>>> >> >>>>>>>>> Let's see if we can get repros for each of the inconsistent >> >>> issues >> >>>>> and >> >>>>>>>>> check in again EOD. >> >>>>>>>>> >> >>>>>>>>> thanks, >> >>>>>>>>> Jacques >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Jacques Nadeau >> >>>>>>>>> CTO and Co-Founder, Dremio >> >>>>>>>>> >> >>>>>>>>> On Thu, Nov 5, 2015 at 3:36 PM, Aditya < >> >> [email protected] >> >>>> >> >>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Ran into another one - DRILL-4042 >> >>>>>>>>>> <https://issues.apache.org/jira/browse/DRILL-4042>. >> >>>>>>>>>> >> >>>>>>>>>> On Thu, Nov 5, 2015 at 1:48 PM, Jacques Nadeau < >> >>>> [email protected] >> >>>>>> >> >>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> Yeah, I think that sinks it. Weird how Rat complains only on >> >>>>>> windows... >> >>>>>>>>>>> >> >>>>>>>>>>> Let's take the rest of the business day to test the current >> >>>>>> candidate >> >>>>>>>> to >> >>>>>>>>>>> make sure that we don't spin extra builds unnecessarily. >> >>>>>>>>>>> >> >>>>>>>>>>> thanks, >> >>>>>>>>>>> Jacques >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Jacques Nadeau >> >>>>>>>>>>> CTO and Co-Founder, Dremio >> >>>>>>>>>>> >> >>>>>>>>>>> On Thu, Nov 5, 2015 at 1:24 PM, Aditya < >> >>> [email protected] >> >>>>> >> >>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Oh, I thought only master/trunk branch was protected, but >> >>> now I >> >>>>> see >> >>>>>>>> the >> >>>>>>>>>>>> mail from David Nalley. >> >>>>>>>>>>>> >> >>>>>>>>>>>> In such case, I propose that the release manager could push >> >>> the >> >>>>>> branch >> >>>>>>>>>>>> to his/her private fork and put the URL/hash in the vote >> >>>> starter >> >>>>>>>> thread. >> >>>>>>>>>>>> >> >>>>>>>>>>>> The reason I was looking to the commit history to determine >> >>> if >> >>>>> the >> >>>>>>>>>>>> candidate suffer from DRILL-4040, which, evidently it does. >> >>>>>>>>>>>> >> >>>>>>>>>>>> -1 as the build from source is failing. >> >>>>>>>>>>>> >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/DRILL-4040 >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:12 PM, Jacques Nadeau < >> >>>>> [email protected] >> >>>>>>> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> I'm not sure what to do here. INFRA just changed the Git >> >>>>> behavior >> >>>>>> so >> >>>>>>>> it >> >>>>>>>>>>>>> is no longer possible to delete branches. I generally >> >> don't >> >>>> like >> >>>>>> to >> >>>>>>>> have >> >>>>>>>>>>>>> failed branches in a release history (otherwise you get a >> >>>>> release >> >>>>>>>> branch >> >>>>>>>>>>>>> with all these maven forward/backwards commits). As such, >> >> I >> >>>>> would >> >>>>>>>> overwrite >> >>>>>>>>>>>>> candidate branches historically (dropping the failed >> >> release >> >>>>>>>> commits). >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> The commit is here right now: >> >>>>>>>>>>>>> https://github.com/jacques-n/drill/tree/drill-1.3.0-rc0 >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> The parent of 4822068a006aeb251b686d2b51871573c4337e60 >> >>>>>>>>>>>>> is >> >>>>>>>>>>>>> 3dedc158f3af8ec8320a9cd336b2798b09cc9a8d (the tip of >> >> master) >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> -- >> >>>>>>>>>>>>> Jacques Nadeau >> >>>>>>>>>>>>> CTO and Co-Founder, Dremio >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Thu, Nov 5, 2015 at 1:01 PM, Aditya < >> >>>> [email protected] >> >>>>>> >> >>>>>>>> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> I am having trouble determining the git commit this >> >> release >> >>>> is >> >>>>>> based >> >>>>>>>>>>>>>> on as >> >>>>>>>>>>>>>> I could not find the >> >>>>>>>>>>>>>> id (4822068a006aeb251b686d2b51871573c4337e60) captured in >> >>> the >> >>>>>>>>>>>>>> git.properties bundled in the >> >>>>>>>>>>>>>> tarballs in the Drill Git repository. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Most likely the last commit is only in your local branch >> >>> and >> >>>>>> since >> >>>>>>>>>>>>>> git.properties captures only the >> >>>>>>>>>>>>>> last commit, it is impossible to find the parent commit. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Would it make sense to push the release branch? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> aditya... >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Wed, Nov 4, 2015 at 11:08 PM, Jacques Nadeau < >> >>>>>> [email protected] >> >>>>>>>>> >> >>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Hey Everybody, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I'm happy to propose a new release of Apache Drill, >> >>> version >> >>>>>> 1.3.0. >> >>>>>>>>>>>>>> This is >> >>>>>>>>>>>>>>> the first release candidate (rc0). It covers a total >> >> of >> >>>> ~50 >> >>>>>>>> closed >> >>>>>>>>>>>>>> JIRAs >> >>>>>>>>>>>>>>> [1]. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> The tarball artifacts are hosted at [2] and the maven >> >>>>> artifacts >> >>>>>>>> are >> >>>>>>>>>>>>>> hosted >> >>>>>>>>>>>>>>> at [3]. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> The vote will be open for 72 hours ending at 11PM >> >>> Pacific, >> >>>>>>>> November >> >>>>>>>>>>>>>> 7, >> >>>>>>>>>>>>>>> 2015. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> [ ] +1 >> >>>>>>>>>>>>>>> [ ] +0 >> >>>>>>>>>>>>>>> [ ] -1 >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> thanks, >> >>>>>>>>>>>>>>> Jacques >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> [1] >> >> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12332946 >> >>>>>>>>>>>>>>> [2] >> >>>>> http://people.apache.org/~jacques/apache-drill-1.3.0.rc0/ >> >>>>>>>>>>>>>>> [3] >> >>> >> https://repository.apache.org/content/repositories/orgapachedrill-1013/ >> >> >> > >
