1) I ran “mvn clean install” back to back (almost). So serial, and all tests.

2) Tests rely on pause time so that threads wait 'long enough’ so that Drill 
receives and propagates a cancel signal. The proposal in 
https://issues.apache.org/jira/browse/DRILL-2697 
<https://issues.apache.org/jira/browse/DRILL-2697> would make test cases work 
without any timing issues.

> On Apr 29, 2015, at 9:15 AM, Jacques Nadeau <[email protected]> wrote:
> 
> Quick question re 10 runs: are these runs that are in parallel with all the
> unit tests or just this test?
> 
> The other question is: how do we construct these tests so they it is
> extremely unlikely to get a failure even if processing is slow or threads
> are suspended?
> 
> On Wed, Apr 29, 2015 at 7:53 AM, Sudheesh Katkam <[email protected]>
> wrote:
> 
>> I am responsible for those tests. I ran the tests at least 10 times on my
>> Linux VM with 1 second pauses, all of which passed.
>> 
>> On your second run, what different errors did you see?
>> 
>> On your third run, are you able to reproduce the test case the hangs?
>> 
>> Sorry that the message is not informative. I already have a patch which is
>> a slight improvement to Jacques change that improves the message in those
>> tests.
>> 
>> What tool did you use to get the thread count?
>> 
>> - Sudheesh
>> 
>> Sent from my iPhone. Pardon any typos.
>> 
>>> On Apr 29, 2015, at 6:28 AM, Abdel Hakim Deneche <[email protected]>
>> wrote:
>>> 
>>> The message displayed in the first run contains actually two different
>>> issues:
>>> 
>>> 1. The error message "Error shutting down Drillbit 'beta'" is most likely
>>> caused by this issue DRILL-2878
>>> <https://issues.apache.org/jira/browse/DRILL-2878>
>>> 
>>> 2. The test that failed with an "java.lang.AssertionError: null" is most
>>> likely a bug because that unit test should not fail. I've seen this error
>>> before, but it only happens intermittently.
>>> 
>>> The system error reported in the 3rd run is actually an "expected"
>> injected
>>> exception, but 278 threads looks suspicious!!!
>>> 
>>> On Wed, Apr 29, 2015 at 12:13 AM, Daniel Barclay <[email protected]>
>>> wrote:
>>> 
>>>> Does anyone know what's going on with TestDrillbitResilience (rebased
>>>> from master today)?  (Is it working right?)
>>>> 
>>>> 
>>>> One run, via "mvn install", yielded assertion errors:
>>>> 
>>>> ...
>>>> Error shutting down Drillbit "beta".
>>>> Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 33.811
>>>> sec <<< FAILURE! - in
>> org.apache.drill.exec.server.TestDrillbitResilience
>>>> 
>> cancelAfterEverythingIsCompleted(org.apache.drill.exec.server.TestDrillbitResilience)
>>>> Time elapsed: 1.468 sec  <<< FAILURE!
>>>> java.lang.AssertionError: null
>>>>       at
>>>> 
>> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459)
>>>>       at
>>>> 
>> org.apache.drill.exec.server.TestDrillbitResilience.cancelAfterEverythingIsCompleted(TestDrillbitResilience.java:565)
>>>> 
>>>> 
>> cancelInMiddleOfFetchingResults(org.apache.drill.exec.server.TestDrillbitResilience)
>>>> Time elapsed: 1.496 sec  <<< FAILURE!
>>>> java.lang.AssertionError: null
>>>>       at
>>>> 
>> org.apache.drill.exec.server.TestDrillbitResilience.assertCancelled(TestDrillbitResilience.java:459)
>>>>       at
>>>> 
>> org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults(TestDrillbitResilience.java:510)
>>>> 
>>>> Running <next test>
>>>> ...
>>>> 
>>>> 
>>>> A second run, run individually (but still via Maven) died with different
>>>> errors.
>>>> 
>>>> 
>>>> 
>>>> A third run, via "mvn install" again, seems hung after reporting this
>>>> (maybe expected) exception:
>>>> 
>>>> Exception (no rows returned):
>>>> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
>>>> run-try-end
>>>> 
>>>> 
>>>> [fb9cfe61-af6e-4c9c-b6ab-8a1b8725c6e9 on dev-linux2:31010]
>>>> 
>>>> 
>>>> The process is using only about 5% CPU--but has 278 threads!
>>>> (That includes about 35 threads all with the same name of
>> "BitClient-1".)
>>>> 
>>>> 
>>>> Daniel
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Daniel Barclay
>>>> MapR Technologies
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Abdelhakim Deneche
>>> 
>>> Software Engineer
>>> 
>>> <http://www.mapr.com/>
>>> 
>>> 
>>> Now Available - Free Hadoop On-Demand Training
>>> <
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>> 
>> 

Reply via email to