[
https://issues.apache.org/jira/browse/BEAM-11833?focusedWorklogId=555930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-555930
]
ASF GitHub Bot logged work on BEAM-11833:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Feb/21 18:20
Start Date: 22/Feb/21 18:20
Worklog Time Spent: 10m
Work Description: boyuanzz commented on pull request #14013:
URL: https://github.com/apache/beam/pull/14013#issuecomment-783572819
> > The reason why we want to call
`watermarkEstimator.setWatermark(currentRestriction.getWatermark());` when
`tryClaim()` returns `false` is for tracking watermark when returning
ProcessContinuation.resume(). It could happen when there is no output records
from reader and we want to read again later.
>
> That seems incorrect. When tryClaim returns `false`, it is part of the
contract to return `ProcessContinuation.done()`:
>
https://github.com/apache/beam/blob/aaad864c9acb22e35050f974a7ac74fb7638f085/runners/core-java/src/main/java/org/apache/beam/runners/core/OutputAndTimeBoundedSplittableProcessElementInvoker.java#L221
>
>
> When the reader returns false() we do not fail clam, instead we go though
`out[0] == null` in `processElement`.
> I think there should be no reason to enforce:
> a) returning ProcessContinuation.done(), and
> b) manually setting the watermark to BoundedWindow.TIMESTAMP_MAX_VALUE
> because, that seems redundant.
Thanks, Jan! I think the unbounded wrapper does some hacks there but I'll
double check. It would be great if we could change the implementation into a
better one.
> You are right, that the watermark is read _before_ call to trySplit (I
overlooked that), that probably means, we _must_ set watermark both _before_
and _after_ the tryClaim loop.
I'm afraid that setting watermark both _before_ and _after_ the tryClaim
loop is still not enough. For most cases, every time `tryClaim` is called and
there is one output record, the watermark should advance. If we only set
watermark outside of `tryClaim` loop, we will still hold back the watermark
somehow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 555930)
Time Spent: 2h 10m (was: 2h)
> UnboundedSourceAsSDFRestrictionTracker reports incorrect watermark after
> failed claim
> -------------------------------------------------------------------------------------
>
> Key: BEAM-11833
> URL: https://issues.apache.org/jira/browse/BEAM-11833
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Affects Versions: 2.28.0
> Reporter: Jan Lukavský
> Assignee: Jan Lukavský
> Priority: P1
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> After being split by call to {{trySplit}}, the watermark reported by
> {{UnboundedSourceAsSDFRestrictionTracker.currentRestriction().getWatermark()}}
> is BoundedWindow.TIMESTAMP_MAX_VALUE, which is incorrect.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)