There would be ways around the lack of checkpointing in cycles, e.g.
buffer and backloop only after checkpointing is complete, similarly how
we implement @RequiresStableInput in the Flink Runner.
-Max
On 07.10.20 04:05, Reuven Lax wrote:
It appears that there's a proposal
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance>)
and an abandoned PR to fix this, but AFAICT this remains a limitation of
Flink. If Flink can't guarantee processing of records on back edges, I
don't think we can use cycles, as we might otherwise lose the residuals.
On Tue, Oct 6, 2020 at 6:16 PM Reuven Lax <[email protected]
<mailto:[email protected]>> wrote:
This is what I was thinking of
"Flink currently only provides processing guarantees for jobs
without iterations. Enabling checkpointing on an iterative job
causes an exception. In order to force checkpointing on an iterative
program the user needs to set a special flag when enabling
checkpointing:|env.enableCheckpointing(interval,
CheckpointingMode.EXACTLY_ONCE, force = true)|.
Please note that records in flight in the loop edges (and the state
changes associated with them) will be lost during failure."
On Tue, Oct 6, 2020 at 5:44 PM Boyuan Zhang <[email protected]
<mailto:[email protected]>> wrote:
Hi Reuven,
As Luke mentioned, at least there are some limitations around
tracking watermark with flink cycles. I'm going to use State +
Timer without flink cycle to support self-checkpoint. For
dynamic split, we can either explore flink cycle approach or
limit depth approach.
On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax <[email protected]
<mailto:[email protected]>> wrote:
Aren't there some limitations associated with flink cycles?
I seem to remember various features that could not be used.
I'm assuming that watermarks are not supported across
cycles, but is there anything else?
On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels
<[email protected] <mailto:[email protected]>> wrote:
Thanks for starting the conversation. The two approaches
both look good
to me. Probably we want to start with approach #1 for
all Runners to be
able to support delaying bundles. Flink supports cycles
and thus
approach #2 would also be applicable and could be used
to implement
dynamic splitting.
-Max
On 05.10.20 23:13, Luke Cwik wrote:
> Thanks Boyuan, I left a few comments.
>
> On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang
<[email protected] <mailto:[email protected]>
> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
>
> Hi team,
>
> I'm looking at adding self-checkpoint support to
portable Flink
> runner(BEAM-10940
> <https://issues.apache.org/jira/browse/BEAM-10940
<https://issues.apache.org/jira/browse/BEAM-10940>>) for
both batch
> and streaming. I summarized the problem that we
want to solve and
> proposed 2 potential approaches in this doc
>
<https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing <https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing>>.
>
> I want to collect feedback on which approach is
preferred and
> anything that I have not taken into consideration
yet but I should.
> Many thanks to all your help!
>
> Boyuan
>