[ 
https://issues.apache.org/jira/browse/DRILL-8030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-8030:
-----------------------------------
    Summary: Intermittent TestDrillbitResilience 
cancelInMiddleOfFetchingResults and foreman_runTryEnd failures  (was: 
TestDrillbitResilience)

> Intermittent TestDrillbitResilience cancelInMiddleOfFetchingResults and 
> foreman_runTryEnd failures
> --------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-8030
>                 URL: https://issues.apache.org/jira/browse/DRILL-8030
>             Project: Apache Drill
>          Issue Type: Sub-task
>          Components: Tools, Build & Test
>    Affects Versions: 1.19.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Minor
>             Fix For: Future
>
>
> DRILL-7908 fixes distributed deadlocks in _TestDrillbitResilience_ and add 
> better timing for simulation the different Drill states. But sometimes 
> several tests failed.
>  1. Sometimes tests indicate memory leak:
> {code:java}
> Error:  Failures: 
> 3419Error:  
> org.apache.drill.exec.server.TestDrillbitResilience.cancelInMiddleOfFetchingResults
> 3420Error:    Run 1: 
> TestDrillbitResilience.cancelInMiddleOfFetchingResults:375 We are leaking 
> 3000000 bytes ==> expected: <0> but was: <3000000>
> {code}
> But actually there is no memory leak. Looks like Drill just check actual 
> memory to early, when dot all fragments are closed, so adding timeout before 
> final _countAllocatedMemory_ fixes the issue. 
>  The other reason of test failures - the queries were not in expected state 
> before cancelling (for instance in STARTING state instead of RUNNING), so 
> adding timeout before starting cancellation thread allows to wait the proper 
> drill query state, which is expected to be for Drill  in test case before 
> cancellation.
>  I don't have anymore test failures with NUM_RUNS = 1000 (@RepeatedTest) for 
> the problematic test cases. 
> 2. The other test case which failed is:
> {code:java}
> Error:  Failures: 
> 3540Error:    
> TestDrillbitResilience.foreman_runTryEnd:289->testForeman:973->assertFailsWithException:960->assertFailsWithException:954
>  Query state should be FAILED (and not COMPLETED). ==> expected: <COMPLETED> 
> but was: <FAILED>{code}
> It relates to DRILL-3167. The root cause here is the following: in some cases 
> we are completing the query faster than run-try-end exception is injecetd and 
> thrown in Foreman. The Completed state is acceptable for such cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to