gaoyunhaii commented on PR #20091: URL: https://github.com/apache/flink/pull/20091#issuecomment-1184180054
Hi @rkhachatryan very sorry for the delay, I have a check with the logic here, the cause of the failure is that 1. For final checkpoint mechanism we has some specialization treatment for operators using union list state that we will abort the checkpoint if parts of the subtasks have finished. This avoid the possible state loss and data inconsistency for union list state. 2. The checking happens in finalize step, in this step, it would check all the operators using union list state, if parts of its subtasks finished, an exception will be thrown and the checkpoint will be failed with FINALIZE_CHECKPOINT_FAILURE 3. Since in this PR we instead count FINALIZE_CHECKPOINT_FAILURE as explicit failures, thus some tests will be affected. Since the failures with union list state is currently a by-design behavior, I tend to now for this case we fail the checkpoint with a dedicated reason and continue to not counting them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
