[GitHub] [flink] tillrohrmann commented on pull request #12670: [FLINK-18290][checkpointing] Fail job on checkpoint future failure instead of System.exit

GitBox Wed, 17 Jun 2020 00:55:16 -0700


tillrohrmann commented on pull request #12670:
URL: https://github.com/apache/flink/pull/12670#issuecomment-645215698



   > I couldn't find a good way to test the fix (the root problem involves two 
threads without accessible synchronization points).
   
   What about writing a test which starts a checkpoint and then shuts down the 
`CheckpointCoordinator`? W/o the fix this test should eventually fail the JVM 
because of the `RejectedExecutionException`. This should be good enough for a 
test which ensures that shutting down the `CheckpointCoordinator` while a 
checkpoint is in progress does not cause uncaught exceptions.
   
   > I think it won't help because it will only fail the timer thread, 
potentially leaving job unaffected and JM/JVM running.
   
   I think your analysis is not correct. We still have 
`FutureUtils.assertNoException` after `exceptionally` is called. Hence, every 
exception which bubbles up from this call will trigger the 
`FatalExitExceptionHandler` and kill the JVM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] tillrohrmann commented on pull request #12670: [FLINK-18290][checkpointing] Fail job on checkpoint future failure instead of System.exit

Reply via email to