[GitHub] flink pull request #1954: [FLINK-3190] failure rate restart strategy

tillrohrmann Fri, 17 Jun 2016 06:56:38 -0700

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1954#discussion_r67513578
  
    --- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphRestartTest.java
 ---
    @@ -174,71 +140,54 @@ private void validateConstraints(ExecutionGraph eg) {
     
        @Test
        public void testRestartAutomatically() throws Exception {
    -           Instance instance = ExecutionGraphTestUtils.getInstance(
    -                           new 
SimpleActorGateway(TestingUtils.directExecutionContext()),
    -                           NUM_TASKS);
    +           RestartStrategy restartStrategy = new 
FixedDelayRestartStrategy(1, 1000);
    +           Tuple2<ExecutionGraph, Instance> executionGraphInstanceTuple = 
createExecutionGraph(restartStrategy);
    +           ExecutionGraph eg = executionGraphInstanceTuple.f0;
     
    -           Scheduler scheduler = new 
Scheduler(TestingUtils.defaultExecutionContext());
    -           scheduler.newInstanceAvailable(instance);
    -
    -           JobVertex sender = new JobVertex("Task");
    -           sender.setInvokableClass(Tasks.NoOpInvokable.class);
    -           sender.setParallelism(NUM_TASKS);
    -
    -           JobGraph jobGraph = new JobGraph("Pointwise job", sender);
    -
    -           ExecutionGraph eg = new ExecutionGraph(
    -                           TestingUtils.defaultExecutionContext(),
    -                           new JobID(),
    -                           "Test job",
    -                           new Configuration(),
    -                           ExecutionConfigTest.getSerializedConfig(),
    -                           AkkaUtils.getDefaultTimeout(),
    -                           new FixedDelayRestartStrategy(1, 1000));
    -           
eg.attachJobGraph(jobGraph.getVerticesSortedTopologicallyFromSources());
    +           restartAfterFailure(eg, new FiniteDuration(2, 
TimeUnit.MINUTES), true);
    +   }
     
    -           assertEquals(JobStatus.CREATED, eg.getState());
    +   @Test
    +   public void taskShouldFailWhenFailureRateLimitExceeded() throws 
Exception {
    +           FailureRateRestartStrategy restartStrategy = new 
FailureRateRestartStrategy(2, TimeUnit.SECONDS, 0);
    +           FiniteDuration timeout = new FiniteDuration(50, 
TimeUnit.MILLISECONDS);
    +           Tuple2<ExecutionGraph, Instance> executionGraphInstanceTuple = 
createExecutionGraph(restartStrategy);
    +           ExecutionGraph eg = executionGraphInstanceTuple.f0;
    +
    +           restartAfterFailure(eg, timeout, false);
    +           restartAfterFailure(eg, timeout, false);
    +           //failure rate limit not exceeded yet, so task is running
    +           assertEquals(JobStatus.RUNNING, eg.getState());
    +           Thread.sleep(1000); //wait for a second to restart limit rate
     
    -           eg.scheduleForExecution(scheduler);
    +           restartAfterFailure(eg, timeout, false);
    +           restartAfterFailure(eg, timeout, false);
    +           makeAFailureAndWait(eg, timeout);
    --- End diff --
    
    Can we try to harden this test a little bit. The problem is that on Travis 
concurrent executions (e.g. the restart future) can take quite some time. Thus, 
it might easily happen that we run into the 50 milliseconds timeout or that the 
three failures don't occur within one second, even though that the test passes 
without problem on your local machine.
    
    I think it would be better to split the test so that you treat the first 
half and the second half in separate test cases. In the second test case, we 
should increase the failure interval to make sure that we can produce 3 
failures within that time interval.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #1954: [FLINK-3190] failure rate restart strategy

Reply via email to