[GitHub] [flink] gaoyunhaii opened a new pull request #16432: [FLINK-23233][runtime] Failing checkpoints before failover for failed events in OperatorCoordinator

GitBox Thu, 08 Jul 2021 06:33:13 -0700


gaoyunhaii opened a new pull request #16432:
URL: https://github.com/apache/flink/pull/16432



   ## What is the purpose of the change
   
   This PR changes how checkpoint in OperatorCoordinator tracks the result of 
the previously sent event to be that the failed events would be kept till it 
has been processed (namely triggered failover for the corresponding subtasks). 
Otherwise there might be event loses if there are checkpoints after fails to 
sending event and the subtask failover due to the lost event won't be included 
in these checkpoints. 
   
   ## Brief change log
   
   - 3f92e0cf1c13240efad8e5227ec2eec55cf6028d changes the event tracking logic.
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change can be verified by the added unit tests and by the manually test 
with the failed cases.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **no**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **no**
     - If yes, how is the feature documented? **not applicable**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] gaoyunhaii opened a new pull request #16432: [FLINK-23233][runtime] Failing checkpoints before failover for failed events in OperatorCoordinator

Reply via email to