gaoyunhaii commented on PR #20091:
URL: https://github.com/apache/flink/pull/20091#issuecomment-1184180054

   Hi @rkhachatryan very sorry for the delay, I have a check with the logic 
here, the cause of the failure is that
   
   1. For final checkpoint mechanism we has some specialization treatment for 
operators using union list state that we will abort the checkpoint if parts of 
the subtasks have finished. This avoid the possible state loss and data 
inconsistency for union list state. 
   2. The checking happens in finalize step, in this step, it would check all 
the operators using union list state, if parts of its subtasks finished, an 
exception will be thrown and the checkpoint will be failed with 
FINALIZE_CHECKPOINT_FAILURE
   3. Since in this PR we instead count FINALIZE_CHECKPOINT_FAILURE as explicit 
failures, thus some tests will be affected. 
   
   Since the failures with union list state is currently a by-design behavior, 
I tend to now for this case we fail the checkpoint with a dedicated reason and 
continue to not counting them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to