[ 
https://issues.apache.org/jira/browse/FLINK-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581727#comment-15581727
 ] 

Alexander Shoshin commented on FLINK-4283:
------------------------------------------

I found out that these tests will fail if you run them on less than 3 core CPU 
machine.
ExecutionContext.global which is used to restart execution graph in 
ExecutionGraphRestartTest.java sets the pool threads number to the amount of 
available processors by default.
There are 3 tests in the class that will lock thread from the pool by calling 
'sleep(Long.MAX_VALUE)' while asynchronous graph restart.
So no one else can use ExecutionContext to restart execution graph while all 
available threads are sleeping. Thats why some tests fails.

My approach is to set small restart delays that will be enough to finish tests 
successfully:
https://github.com/apache/flink/compare/master...AlexanderShoshin:FLINK-4283%231

But if these 'infinite' delays have some reason to be in code (though i don't 
see any) we can use VM attributes to increase the default pool threads number:
https://github.com/apache/flink/compare/master...AlexanderShoshin:FLINK-4283%232

> ExecutionGraphRestartTest fails
> -------------------------------
>
>                 Key: FLINK-4283
>                 URL: https://issues.apache.org/jira/browse/FLINK-4283
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: Ubuntu 14.04
> W10
>            Reporter: Chesnay Schepler
>            Assignee: Alexander Shoshin
>              Labels: test-stability
>
> I encounter reliable failures for the following tests:
> testRestartAutomatically(org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest)
>   Time elapsed: 120.089 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<RUNNING> but was:<RESTARTING>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.restartAfterFailure(ExecutionGraphRestartTest.java:680)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testRestartAutomatically(ExecutionGraphRestartTest.java:155)
> taskShouldNotFailWhenFailureRateLimitWasNotExceeded(org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest)
>   Time elapsed: 2.055 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<RUNNING> but was:<RESTARTING>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.restartAfterFailure(ExecutionGraphRestartTest.java:680)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.taskShouldNotFailWhenFailureRateLimitWasNotExceeded(ExecutionGraphRestartTest.java:180)
> testFailingExecutionAfterRestart(org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest)
>   Time elapsed: 120.079 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<RUNNING> but was:<RESTARTING>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testFailingExecutionAfterRestart(ExecutionGraphRestartTest.java:397)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to