[
https://issues.apache.org/jira/browse/BEAM-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064054#comment-16064054
]
Eugene Kirpichov commented on BEAM-2447:
----------------------------------------
If the runner did not already take a checkpoint, and the DoFn returns resume(),
and the runner treats it as stop(), then the DoFn won't be resumed even though
it should.
Perhaps you meant treat everything as resume()? In that case, the DoFn will
never terminate, even if it's done with the restriction, because it'll
constantly be resumed and will keep producing empty checkpoints.
> Reintroduce DoFn.ProcessContinuation
> ------------------------------------
>
> Key: BEAM-2447
> URL: https://issues.apache.org/jira/browse/BEAM-2447
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Reporter: Eugene Kirpichov
> Assignee: Eugene Kirpichov
>
> ProcessContinuation.resume() is useful for tailing files - when we reach
> current EOF, we want to voluntarily suspend the process() call rather than
> wait for runner to checkpoint us.
> In BEAM-1903, DoFn.ProcessContinuation was removed because there was
> ambiguity about the semantics of resume() especially w.r.t. the following
> situation described in
> https://docs.google.com/document/d/1BGc8pM1GOvZhwR9SARSVte-20XEoBUxrGJ5gTWXdv3c/edit
> : the runner has taken a checkpoint on the tracker, and then the
> ProcessElement call returns resume() signaling that the work is still not
> done - then there's 2 checkpoints to deal with.
> Instead, the proper way to refine this semantics is:
> - After checkpoint() on a RestrictionTracker, the tracker MUST fail all
> subsequent tryClaim() calls, and MUST succeed in checkDone().
> - After a failed tryClaim() call, the ProcessElement method MUST return stop()
> - So ProcessElement can return resume() only *instead* of doing tryClaim()
> - Then, if the runner has already taken a checkpoint but tracker has returned
> resume(), we do not need to take a new checkpoint - the one already taken
> already accurately describes the remainder of the work.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)