[
https://issues.apache.org/jira/browse/DRILL-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vitalii Diravka updated DRILL-8030:
-----------------------------------
Description:
DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add
better timing for simulation the different Drill states. But sometimes several
tests failed.
1. Sometimes tests indicate memory leak. They are not there, looks like Drill
just check actual memory to early, when dot all fragments are closed, so adding
timeout before final _countAllocatedMemory_ fixes the issue.
The other reason of test failures - the queries were not in expected state
before cancelling (for instance in STARTING state instead of RUNNING), so
adding timeout before starting cancellation thread allows to wait the proper
drill query state, which is expected to be for Drill in test case before
cancellation.
I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for
the problematic test cases.
2. The other test case which failed is:
{code:java}
Error: Failures:
3540Error:
TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED>
but was: <FAILED>{code}
It relates to DRILL-3167. The root cause here is the following: in some cases
we are completing the query faster than run-try-end exception is injecetd and
thrown in Foreman. The Completed state is fine for such cases
was:
DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add
better timing for simulation the different Drill states. But sometimes tests
indicate memory leak.
They are not there, looks like Drill just check actual memory to early, when
dot all fragments are closed, so adding timeout before final
_countAllocatedMemory_ fixes the issue.
The other reason of test failures - the queries were not in expected state
before cancelling (for instance in STARTING state instead of RUNNING), so
adding timeout before starting cancellation thread allows to wait the proper
drill query state, which is expected to be for Drill in test case before
cancellation.
I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for
the problematic test cases.
The other test case which failed is:
{code:java}
Error: Failures:
3540Error:
TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED>
but was: <FAILED>
{code}
> Memory leak in TestDrillbitResilience
> -------------------------------------
>
> Key: DRILL-8030
> URL: https://issues.apache.org/jira/browse/DRILL-8030
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Tools, Build & Test
> Affects Versions: 1.19.0
> Reporter: Vitalii Diravka
> Assignee: Vitalii Diravka
> Priority: Minor
> Fix For: Future
>
>
> DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add
> better timing for simulation the different Drill states. But sometimes
> several tests failed.
> 1. Sometimes tests indicate memory leak. They are not there, looks like Drill
> just check actual memory to early, when dot all fragments are closed, so
> adding timeout before final _countAllocatedMemory_ fixes the issue.
> The other reason of test failures - the queries were not in expected state
> before cancelling (for instance in STARTING state instead of RUNNING), so
> adding timeout before starting cancellation thread allows to wait the proper
> drill query state, which is expected to be for Drill in test case before
> cancellation.
> I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for
> the problematic test cases.
> 2. The other test case which failed is:
> {code:java}
> Error: Failures:
> 3540Error:
> TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
> Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED>
> but was: <FAILED>{code}
> It relates to DRILL-3167. The root cause here is the following: in some cases
> we are completing the query faster than run-try-end exception is injecetd and
> thrown in Foreman. The Completed state is fine for such cases
--
This message was sent by Atlassian Jira
(v8.3.4#803005)