[GitHub] [flink] PatrickRen commented on pull request #16951: [FLINK-23807][connector/testing-framework] Use numRestarts metric to detect TaskManager termination in MiniClusterTestEnvironment

GitBox Tue, 24 Aug 2021 07:49:48 -0700


PatrickRen commented on pull request #16951:
URL: https://github.com/apache/flink/pull/16951#issuecomment-904710561



   Thanks @zentol for the review! 
   
   After a second thought I'm kinda doubting whether we need this check in 
```triggerTaskManagerFailover```. Initially I was thinking we need to make sure 
that the job indeed suffers a failover after terminating the TM, so I used 
```waitForJobStatus``` to wait until job getting into FAILED, FAILING or 
RESTART status after killing the TM. Actually this is not necessary since we 
always terminate the only TM and start a new one to trigger failover, so the 
job will definitely suffer a failover.
   
   I'd like to just remove the ```waitForJobStatus``` in 
```MiniCluster#triggerTaskManagerFailover``` instead of using metric to detect 
restart:
   ```java
   @Override
   public void triggerTaskManagerFailover(JobClient jobClient, Runnable 
afterFailAction) throws Exception {
       terminateTaskManager();
       afterFailAction.run();
       startTaskManager();
   }
   ```
   
   What do you think @zentol @AHeise ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] PatrickRen commented on pull request #16951: [FLINK-23807][connector/testing-framework] Use numRestarts metric to detect TaskManager termination in MiniClusterTestEnvironment

Reply via email to