[ 
https://issues.apache.org/jira/browse/DRILL-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-8030:
-----------------------------------
    Description: 
DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add 
better timing for simulation the different Drill states. But sometimes several 
tests failed.
1. Sometimes tests indicate memory leak. They are not there, looks like Drill 
just check actual memory to early, when dot all fragments are closed, so adding 
timeout before final _countAllocatedMemory_ fixes the issue. 
 The other reason of test failures - the queries were not in expected state 
before cancelling (for instance in STARTING state instead of RUNNING), so 
adding timeout before starting cancellation thread allows to wait the proper 
drill query state, which is expected to be for Drill  in test case before 
cancellation.
 I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for 
the problematic test cases. 

2. The other test case which failed is:
{code:java}
Error:  Failures: 
3540Error:    
TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
 Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> 
but was: <FAILED>{code}
It relates to DRILL-3167. The root cause here is the following: in some cases 
we are completing the query faster than run-try-end exception is injecetd and 
thrown in Foreman. The Completed state is fine for such cases

  was:
DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add 
better timing for simulation the different Drill states. But sometimes tests 
indicate memory leak.
 They are not there, looks like Drill just check actual memory to early, when 
dot all fragments are closed, so adding timeout before final 
_countAllocatedMemory_ fixes the issue. 
 The other reason of test failures - the queries were not in expected state 
before cancelling (for instance in STARTING state instead of RUNNING), so 
adding timeout before starting cancellation thread allows to wait the proper 
drill query state, which is expected to be for Drill  in test case before 
cancellation.
 I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for 
the problematic test cases. 

The other test case which failed is:
{code:java}
Error:  Failures: 
3540Error:    
TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
 Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> 
but was: <FAILED>
{code}


> Memory leak in TestDrillbitResilience
> -------------------------------------
>
>                 Key: DRILL-8030
>                 URL: https://issues.apache.org/jira/browse/DRILL-8030
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build &amp; Test
>    Affects Versions: 1.19.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Minor
>             Fix For: Future
>
>
> DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add 
> better timing for simulation the different Drill states. But sometimes 
> several tests failed.
> 1. Sometimes tests indicate memory leak. They are not there, looks like Drill 
> just check actual memory to early, when dot all fragments are closed, so 
> adding timeout before final _countAllocatedMemory_ fixes the issue. 
>  The other reason of test failures - the queries were not in expected state 
> before cancelling (for instance in STARTING state instead of RUNNING), so 
> adding timeout before starting cancellation thread allows to wait the proper 
> drill query state, which is expected to be for Drill  in test case before 
> cancellation.
>  I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for 
> the problematic test cases. 
> 2. The other test case which failed is:
> {code:java}
> Error:  Failures: 
> 3540Error:    
> TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
>  Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> 
> but was: <FAILED>{code}
> It relates to DRILL-3167. The root cause here is the following: in some cases 
> we are completing the query faster than run-try-end exception is injecetd and 
> thrown in Foreman. The Completed state is fine for such cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to