Re: Self-checkpoint Support on Portable Flink

Reuven Lax Tue, 13 Oct 2020 21:32:04 -0700

Does this mean that we have to deal with duplicate messages over the back
edge? Or will that not happen, since duplicates mean that we rolled back a
checkpoint.


On Tue, Oct 13, 2020 at 2:59 AM Maximilian Michels <[email protected]> wrote:

> There would be ways around the lack of checkpointing in cycles, e.g.
> buffer and backloop only after checkpointing is complete, similarly how
> we implement @RequiresStableInput in the Flink Runner.
>
> -Max
>
> On 07.10.20 04:05, Reuven Lax wrote:
> > It appears that there's a proposal
> > (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
> > <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance>)
>
> > and an abandoned PR to fix this, but AFAICT this remains a limitation of
> > Flink. If Flink can't guarantee processing of records on back edges, I
> > don't think we can use cycles, as we might otherwise lose the residuals.
> >
> > On Tue, Oct 6, 2020 at 6:16 PM Reuven Lax <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     This is what I was thinking of
> >
> >     "Flink currently only provides processing guarantees for jobs
> >     without iterations. Enabling checkpointing on an iterative job
> >     causes an exception. In order to force checkpointing on an iterative
> >     program the user needs to set a special flag when enabling
> >     checkpointing:|env.enableCheckpointing(interval,
> >     CheckpointingMode.EXACTLY_ONCE, force = true)|.
> >
> >     Please note that records in flight in the loop edges (and the state
> >     changes associated with them) will be lost during failure."
> >
> >
> >
> >
> >
> >
> >     On Tue, Oct 6, 2020 at 5:44 PM Boyuan Zhang <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         Hi Reuven,
> >
> >         As Luke mentioned, at least there are some limitations around
> >         tracking watermark with flink cycles. I'm going to use State +
> >         Timer without flink cycle to support self-checkpoint. For
> >         dynamic split, we can either explore flink cycle approach or
> >         limit depth approach.
> >
> >         On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax <[email protected]
> >         <mailto:[email protected]>> wrote:
> >
> >             Aren't there some limitations associated with flink cycles?
> >             I seem to remember various features that could not be used.
> >             I'm assuming that watermarks are not supported across
> >             cycles, but is there anything else?
> >
> >             On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels
> >             <[email protected] <mailto:[email protected]>> wrote:
> >
> >                 Thanks for starting the conversation. The two approaches
> >                 both look good
> >                 to me. Probably we want to start with approach #1 for
> >                 all Runners to be
> >                 able to support delaying bundles. Flink supports cycles
> >                 and thus
> >                 approach #2 would also be applicable and could be used
> >                 to implement
> >                 dynamic splitting.
> >
> >                 -Max
> >
> >                 On 05.10.20 23:13, Luke Cwik wrote:
> >                  > Thanks Boyuan, I left a few comments.
> >                  >
> >                  > On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang
> >                 <[email protected] <mailto:[email protected]>
> >                  > <mailto:[email protected]
> >                 <mailto:[email protected]>>> wrote:
> >                  >
> >                  >     Hi team,
> >                  >
> >                  >     I'm looking at adding self-checkpoint support to
> >                 portable Flink
> >                  >     runner(BEAM-10940
> >                  >     <https://issues.apache.org/jira/browse/BEAM-10940
> >                 <https://issues.apache.org/jira/browse/BEAM-10940>>) for
> >                 both batch
> >                  >     and streaming. I summarized the problem that we
> >                 want to solve and
> >                  >     proposed 2 potential approaches in this doc
> >                  >
> >                   <
> https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing
> <
> https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing
> >>.
> >                  >
> >                  >     I want to collect feedback on which approach is
> >                 preferred and
> >                  >     anything that I have not taken into consideration
> >                 yet but I should.
> >                  >     Many thanks to all your help!
> >                  >
> >                  >     Boyuan
> >                  >
> >
>

Re: Self-checkpoint Support on Portable Flink

Reply via email to