[
https://issues.apache.org/jira/browse/DRILL-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vitalii Diravka updated DRILL-8030:
-----------------------------------
Summary: Intermittent TestDrillbitResilience
cancelInMiddleOfFetchingResults and foreman_runTryEnd failures (was:
TestDrillbitResilience)
> Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and
> foreman_runTryEnd failures
> --------------------------------------------------------------------------------------------------
>
> Key: DRILL-8030
> URL: https://issues.apache.org/jira/browse/DRILL-8030
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Tools, Build & Test
> Affects Versions: 1.19.0
> Reporter: Vitalii Diravka
> Assignee: Vitalii Diravka
> Priority: Minor
> Fix For: Future
>
>
> DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add
> better timing for simulation the different Drill states. But sometimes
> several tests failed.
> 1. Sometimes tests indicate memory leak:
> {code:java}
> Error: Failures:
> 3419Error:
> org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults
> 3420Error: Run 1:
> TestDrillbitResilience.cancelInMiddleOfFetchingResults:375 We are leaking
> 3000000 bytes ==> expected: <0> but was: <3000000>
> {code}
> But actually there is no memory leak. Looks like Drill just check actual
> memory to early, when dot all fragments are closed, so adding timeout before
> final _countAllocatedMemory_ fixes the issue.
> The other reason of test failures - the queries were not in expected state
> before cancelling (for instance in STARTING state instead of RUNNING), so
> adding timeout before starting cancellation thread allows to wait the proper
> drill query state, which is expected to be for Drill in test case before
> cancellation.
> I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for
> the problematic test cases.
> 2. The other test case which failed is:
> {code:java}
> Error: Failures:
> 3540Error:
> TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
> Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED>
> but was: <FAILED>{code}
> It relates to DRILL-3167. The root cause here is the following: in some cases
> we are completing the query faster than run-try-end exception is injecetd and
> thrown in Foreman. The Completed state is acceptable for such cases
--
This message was sent by Atlassian Jira
(v8.3.4#803005)