vdiravka opened a new pull request #2361: URL: https://github.com/apache/drill/pull/2361
# [DRILL-8030](https://issues.apache.org/jira/browse/DRILL-8030): Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and foreman_runTryEnd failures ## Description DRILL-7908 fixes distributed deadlocks in TestDrillbitResilience and add better timing for simulation the different Drill states. But sometimes several tests failed. 1. Sometimes tests indicate memory leak: ``` Error: Failures: 3419Error: org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults 3420Error: Run 1: TestDrillbitResilience.cancelInMiddleOfFetchingResults:375 We are leaking 3000000 bytes ==> expected: <0> but was: <3000000> ``` But actually there is no memory leak. Looks like Drill just check actual memory to early, when dot all fragments are closed, so adding timeout before final countAllocatedMemory fixes the issue. The other reason of test failures - the queries were not in expected state before cancelling (for instance in STARTING state instead of RUNNING), so adding timeout before starting cancellation thread allows to wait the proper drill query state, which is expected to be for Drill in test case before cancellation. I don't have anymore test failures with `NUM_RUNS` = `1000` (`@RepeatedTest`) for the problematic test cases. 2. The other test case which failed is: ``` Error: Failures: 3540Error: TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954 Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> but was: <FAILED> ``` It relates to DRILL-3167. The root cause here is the following: in some cases we are completing the query faster than run-try-end exception is injecetd and thrown in Foreman. The Completed state is acceptable for such cases ## Testing Tested several times with 1000 repeats for problematic test cases. The info how to debug this test cases is added to `TestDrillbitResilience` javadoc description Along with 304230a it also resolves [DRILL-3052](https://issues.apache.org/jira/browse/DRILL-3052), [DRILL-3167](https://issues.apache.org/jira/browse/DRILL-3167), [DRILL-3193](https://issues.apache.org/jira/browse/DRILL-3193), [DRILL-3194](https://issues.apache.org/jira/browse/DRILL-3194), [DRILL-3967](https://issues.apache.org/jira/browse/DRILL-3967), [DRILL-6228](https://issues.apache.org/jira/browse/DRILL-6228) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
