vdiravka opened a new pull request #2361:
URL: https://github.com/apache/drill/pull/2361


   # [DRILL-8030](https://issues.apache.org/jira/browse/DRILL-8030): 
Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and 
foreman_runTryEnd failures
   
   ## Description
   
   DRILL-7908 fixes distributed deadlocks in TestDrillbitResilience and add 
better timing for simulation the different Drill states. But sometimes several 
tests failed.
   1. Sometimes tests indicate memory leak:
   ```
   Error:  Failures: 
   3419Error:  
org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults
   3420Error:    Run 1: 
TestDrillbitResilience.cancelInMiddleOfFetchingResults:375 We are leaking 
3000000 bytes ==> expected: <0> but was: <3000000>
   ```
   But actually there is no memory leak. Looks like Drill just check actual 
memory to early, when dot all fragments are closed, so adding timeout before 
final countAllocatedMemory fixes the issue.
   The other reason of test failures - the queries were not in expected state 
before cancelling (for instance in STARTING state instead of RUNNING), so 
adding timeout before starting cancellation thread allows to wait the proper 
drill query state, which is expected to be for Drill  in test case before 
cancellation.
   I don't have anymore test failures with `NUM_RUNS` = `1000` 
(`@RepeatedTest`) for the problematic test cases. 
   
   2. The other test case which failed is:
   ```
   Error:  Failures: 
   3540Error:    
TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
 Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> 
but was: <FAILED>
   ```
   It relates to DRILL-3167. The root cause here is the following: in some 
cases we are completing the query faster than run-try-end exception is injecetd 
and thrown in Foreman. The Completed state is acceptable for such cases
   
   ## Testing
   Tested several times with 1000 repeats for problematic test cases. The info 
how to debug this test cases is added to `TestDrillbitResilience` javadoc 
description
   
   Along with 304230a it also resolves 
[DRILL-3052](https://issues.apache.org/jira/browse/DRILL-3052), 
[DRILL-3167](https://issues.apache.org/jira/browse/DRILL-3167), 
[DRILL-3193](https://issues.apache.org/jira/browse/DRILL-3193), 
[DRILL-3194](https://issues.apache.org/jira/browse/DRILL-3194), 
[DRILL-3967](https://issues.apache.org/jira/browse/DRILL-3967), 
[DRILL-6228](https://issues.apache.org/jira/browse/DRILL-6228)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to