[
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065478#comment-16065478
]
Kenneth Knowles commented on BEAM-2140:
---------------------------------------
Ah, it is true that the input element is still pending. That is a very helpful
perspective. So view it as a sort of NACK of the element as a whole, while
saving the restriction to state. However, we can't advance the watermark as
though it was non-splittable in the unbounded case; that would freeze the input
watermark forever. (in the bounded case, I presume it is still the -inf to +inf
story for all the same reasons as bounded sources)
Instead, perhaps the "amount consumed" of this element is measured by the
watermark reporting done during processing of the element by the SDF. So the
input watermark can move forward to that point. This would subsume output
watermark holds, which seems nice.
I doubt think this kind of hold makes sense outside of SDF. Maybe, with revised
semantics as per my final section above, but I don't think we should take on
the additional challenge of designing this in a safe user-facing way in order
to push SDF forwards.
On the internal side you do need a way to manage the watermarks in the
underlying engine. We should note that {{ProcessFn}} already is treated
specially via a {{ProcessFnRunner}} within a Flink {{SplittableDoFnOperator}}.
So we are assuming explicit runner support and we are talking about how the
{{SplittableDoFnOperator}} communicates with Flink. I would actually suggest a
"partial NACK with new watermark" style of API (just like ProcessContinuation)
so that it is tightly coupled with the fact that the element should be
re-delivered. I would focus a lot on making stuck pipelines impossible, since
they are hard to debug.
> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> -------------------------------------------------------
>
> Key: BEAM-2140
> URL: https://issues.apache.org/jira/browse/BEAM-2140
> Project: Beam
> Issue Type: Bug
> Components: runner-flink
> Reporter: Aljoscha Krettek
> Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled
> the tests to unblock the open PR for BEAM-1763.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)