azagrebin opened a new pull request #13749:
URL: https://github.com/apache/flink/pull/13749


   ## What is the purpose of the change
   
   When a task fails and it is `RestartPipelinedRegionFailoverStrategy`, all 
tasks in the region of the failed task and in the downstream regions will be 
canceled for later re-scheduling. However, these tasks can be still in 
`CREATED` state so that there is no need to cancel these tasks.
   
   The PR skips canceling these tasks which can speed up the failover and 
reduce a lot of unnecessary CANCELING logs.ELING logs.
   
   ## Brief change log
   
     - refactor builder for `TestingSchedulingExecutionVertex`
     - add state to executions in `RestartPipelinedRegionFailoverStrategyTest`
     - refactor/deduplicate verification logic in 
`RestartPipelinedRegionFailoverStrategyTest`
     - add test that executions in `CREATED` state do not get restarted
     - fix expected restarts in `BatchFineGrainedRecoveryITCase` because 
subsequent mappers do not get restarted anymore if their parent mapper fails
   
   ## Verifying this change
   
   unit tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to