boyuanzz edited a comment on pull request #14013:
URL: https://github.com/apache/beam/pull/14013#issuecomment-783572819


   > > The reason why we want to call 
`watermarkEstimator.setWatermark(currentRestriction.getWatermark());` when 
`tryClaim()` returns `false` is for tracking watermark when returning 
ProcessContinuation.resume(). It could happen when there is no output records 
from reader and we want to read again later.
   > 
   > That seems incorrect. When tryClaim returns `false`, it is part of the 
contract to return `ProcessContinuation.done()`:
   > 
https://github.com/apache/beam/blob/aaad864c9acb22e35050f974a7ac74fb7638f085/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java#L221
   > 
   > 
   > When the reader returns false() we do not fail clam, instead we go though 
`out[0] == null` in `processElement`.
   > I think there should be no reason to enforce:
   > a) returning ProcessContinuation.done(), and
   > b) manually setting the watermark to BoundedWindow.TIMESTAMP_MAX_VALUE
   > because, that seems redundant.
   
   Thanks, Jan! I think the unbounded wrapper does some hacks there but I'll 
double check. It would be great if we could change the implementation into a 
better one.
   
   > You are right, that the watermark is read _before_ call to trySplit (I 
overlooked that), that probably means, we _must_ set watermark both _before_ 
and _after_ the tryClaim loop.
   
   I'm afraid that setting watermark both _before_ and _after_ the tryClaim 
loop is still not enough. For most cases, every time `tryClaim` is called and 
there is one output record, the watermark should advance. If we only set 
watermark outside of `tryClaim` loop, we will still hold back the watermark 
somehow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to