On Wed, Apr 29, 2015 at 9:15 AM, Jacques Nadeau <[email protected]> wrote:
> Quick question re 10 runs: are these runs that are in parallel with all the > unit tests or just this test? > > The other question is: how do we construct these tests so they it is > extremely unlikely to get a failure even if processing is slow or threads > are suspended? > First problems we hit when processing is slow are junit timeouts. Once a unit tests times out, it's corresponding query isn't cancelled and may continue running in parallel with other unit tests from same test class. Once the @AfterClass method shuts down the drillbits, they may complain about allocators not closed because some queries are actually still running. > On Wed, Apr 29, 2015 at 7:53 AM, Sudheesh Katkam <[email protected]> > wrote: > > > I am responsible for those tests. I ran the tests at least 10 times on my > > Linux VM with 1 second pauses, all of which passed. > > > > On your second run, what different errors did you see? > > > > On your third run, are you able to reproduce the test case the hangs? > > > > Sorry that the message is not informative. I already have a patch which > is > > a slight improvement to Jacques change that improves the message in those > > tests. > > > > What tool did you use to get the thread count? > > > > - Sudheesh > > > > Sent from my iPhone. Pardon any typos. > > > > > On Apr 29, 2015, at 6:28 AM, Abdel Hakim Deneche < > [email protected]> > > wrote: > > > > > > The message displayed in the first run contains actually two different > > > issues: > > > > > > 1. The error message "Error shutting down Drillbit 'beta'" is most > likely > > > caused by this issue DRILL-2878 > > > <https://issues.apache.org/jira/browse/DRILL-2878> > > > > > > 2. The test that failed with an "java.lang.AssertionError: null" is > most > > > likely a bug because that unit test should not fail. I've seen this > error > > > before, but it only happens intermittently. > > > > > > The system error reported in the 3rd run is actually an "expected" > > injected > > > exception, but 278 threads looks suspicious!!! > > > > > > On Wed, Apr 29, 2015 at 12:13 AM, Daniel Barclay < > [email protected]> > > > wrote: > > > > > >> Does anyone know what's going on with TestDrillbitResilience (rebased > > >> from master today)? (Is it working right?) > > >> > > >> > > >> One run, via "mvn install", yielded assertion errors: > > >> > > >> ... > > >> Error shutting down Drillbit "beta". > > >> Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: > 33.811 > > >> sec <<< FAILURE! - in > > org.apache.drill.exec.server.TestDrillbitResilience > > >> > > > cancelAfterEverythingIsCompleted(org.apache.drill.exec.server.TestDrillbitResilience) > > >> Time elapsed: 1.468 sec <<< FAILURE! > > >> java.lang.AssertionError: null > > >> at > > >> > > > org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459) > > >> at > > >> > > > org.apache.drill.exec.server.TestDrillbitResilience.cancelAfterEverythingIsCompleted(TestDrillbitResilience.java:565) > > >> > > >> > > > cancelInMiddleOfFetchingResults(org.apache.drill.exec.server.TestDrillbitResilience) > > >> Time elapsed: 1.496 sec <<< FAILURE! > > >> java.lang.AssertionError: null > > >> at > > >> > > > org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459) > > >> at > > >> > > > org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults(TestDrillbitResilience.java:510) > > >> > > >> Running <next test> > > >> ... > > >> > > >> > > >> A second run, run individually (but still via Maven) died with > different > > >> errors. > > >> > > >> > > >> > > >> A third run, via "mvn install" again, seems hung after reporting this > > >> (maybe expected) exception: > > >> > > >> Exception (no rows returned): > > >> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > > >> run-try-end > > >> > > >> > > >> [fb9cfe61-af6e-4c9c-b6ab-8a1b8725c6e9 on dev-linux2:31010] > > >> > > >> > > >> The process is using only about 5% CPU--but has 278 threads! > > >> (That includes about 35 threads all with the same name of > > "BitClient-1".) > > >> > > >> > > >> Daniel > > >> > > >> > > >> > > >> > > >> > > >> > > >> -- > > >> Daniel Barclay > > >> MapR Technologies > > > > > > > > > > > > -- > > > > > > Abdelhakim Deneche > > > > > > Software Engineer > > > > > > <http://www.mapr.com/> > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
