iht commented on issue #20041:
URL: https://github.com/apache/beam/issues/20041#issuecomment-1312831329

   [In Java, the monotonic watermark 
estimator](https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/WatermarkEstimators.java#L120-L127)
 just updates the watermark if the last observed timestamp is after the 
watermark. I think we should also do the same implementation in Python, for a 
consistent behavior across different SDKs.
   
   The estimator has not do do anything with the data, just update the value of 
the watermark. It should not change the timestamps of the messages either, the 
incoming messages should not be altered, and the timestamps of the output 
messages in the splittable DoFn should be decided by the user of that DoFn.
   
   The current implementation of MonotonicWatermarkEstimator throws an 
exception with late data, which makes it unusable unless you can guarantee the 
order of messages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to