kkdoon commented on PR #28567:
URL: https://github.com/apache/beam/pull/28567#issuecomment-1730436738

   > I don't think we can just skip the check. The fact that the check fails 
implies there are still buffered data that have not been processed yet. We need 
to either: a) correctly flush and process the buffered data (which will clear 
the hold), or b) ensure that checkpoint is called before the final watermark
   > 
   > I have a suspicion that correct is only b), because otherwise, in case of 
failure and restore from checkpoint after we flushed the data, we might break 
the stable input contract.
   
   I agree that option B is the cleanest and makes logical sense. I think in 
implementation terms it works if somehow 
[flushData](https://github.com/apache/beam/blob/79d0a8d11095522a036a6b3007f5fed4f6f46b3b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L608)
 is invoked after last checkpoint finishes (or some other approach). Option A 
violates the requiresStableInput contract and hence seems incorrect (i did 
experiment with this approach as well and it gave the expected output when 
checkpoint successfully completed). 
   
   Having said that, the current approach also works correctly when 
`requiresStableInput` is set since all the buffered data is emitted after 
checkpoint completes and the output watermark finally becomes equal to 
MAX_WATERMARK (i have run multiple jobs to verify the behavior). I can defer 
the `currentOutputWatermark < Long.MAX_VALUE` check for requiresStableInput 
scenario and do this check inside 
[notifyCheckpointComplete](https://github.com/apache/beam/blob/79d0a8d11095522a036a6b3007f5fed4f6f46b3b/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1045),
 at the time of job completion, to verify that output watermark is correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to