Re: Self-checkpoint Support on Portable Flink

Maximilian Michels Wed, 14 Oct 2020 03:41:02 -0700

Duplicates cannot happen because the state of all operators will berolled back to the latest checkpoint, in case of failures.


On 14.10.20 06:31, Reuven Lax wrote:

Does this mean that we have to deal with duplicate messages over theback edge? Or will that not happen, since duplicates mean that we rolledback a checkpoint.

On Tue, Oct 13, 2020 at 2:59 AM Maximilian Michels <[email protected]<mailto:[email protected]>> wrote:


    There would be ways around the lack of checkpointing in cycles, e.g.
    buffer and backloop only after checkpointing is complete, similarly how
    we implement @RequiresStableInput in the Flink Runner.

    -Max

    On 07.10.20 04:05, Reuven Lax wrote:
     > It appears that there's a proposal
     >
    
(https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
    
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance>

     >
    
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance
    
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-16%3A+Loop+Fault+Tolerance>>)

     > and an abandoned PR to fix this, but AFAICT this remains a
    limitation of
     > Flink. If Flink can't guarantee processing of records on back
    edges, I
     > don't think we can use cycles, as we might otherwise lose the
    residuals.
     >
     > On Tue, Oct 6, 2020 at 6:16 PM Reuven Lax <[email protected]
    <mailto:[email protected]>
     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >     This is what I was thinking of
     >
     >     "Flink currently only provides processing guarantees for jobs
     >     without iterations. Enabling checkpointing on an iterative job
     >     causes an exception. In order to force checkpointing on an
    iterative
     >     program the user needs to set a special flag when enabling
     >     checkpointing:|env.enableCheckpointing(interval,
     >     CheckpointingMode.EXACTLY_ONCE, force = true)|.
     >
     >     Please note that records in flight in the loop edges (and the
    state
     >     changes associated with them) will be lost during failure."
     >
     >
     >
     >
     >
     >
     >     On Tue, Oct 6, 2020 at 5:44 PM Boyuan Zhang
    <[email protected] <mailto:[email protected]>
     >     <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >         Hi Reuven,
     >
     >         As Luke mentioned, at least there are some limitations around
     >         tracking watermark with flink cycles. I'm going to use
    State +
     >         Timer without flink cycle to support self-checkpoint. For
     >         dynamic split, we can either explore flink cycle approach or
     >         limit depth approach.
     >
     >         On Tue, Oct 6, 2020 at 5:33 PM Reuven Lax
    <[email protected] <mailto:[email protected]>
     >         <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >             Aren't there some limitations associated with flink
    cycles?
     >             I seem to remember various features that could not be
    used.
     >             I'm assuming that watermarks are not supported across
     >             cycles, but is there anything else?
     >
     >             On Tue, Oct 6, 2020 at 7:12 AM Maximilian Michels
     >             <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>> wrote:
     >
     >                 Thanks for starting the conversation. The two
    approaches
     >                 both look good
     >                 to me. Probably we want to start with approach #1 for
     >                 all Runners to be
     >                 able to support delaying bundles. Flink supports
    cycles
     >                 and thus
     >                 approach #2 would also be applicable and could be
    used
     >                 to implement
     >                 dynamic splitting.
     >
     >                 -Max
     >
     >                 On 05.10.20 23:13, Luke Cwik wrote:
     >                  > Thanks Boyuan, I left a few comments.
     >                  >
     >                  > On Mon, Oct 5, 2020 at 11:12 AM Boyuan Zhang
     >                 <[email protected] <mailto:[email protected]>
    <mailto:[email protected] <mailto:[email protected]>>
     >                  > <mailto:[email protected]
    <mailto:[email protected]>
     >                 <mailto:[email protected]
    <mailto:[email protected]>>>> wrote:
     >                  >
     >                  >     Hi team,
     >                  >
     >                  >     I'm looking at adding self-checkpoint
    support to
     >                 portable Flink
     >                  >     runner(BEAM-10940

> > <https://issues.apache.org/jira/browse/BEAM-10940

    <https://issues.apache.org/jira/browse/BEAM-10940>
     >                 <https://issues.apache.org/jira/browse/BEAM-10940
    <https://issues.apache.org/jira/browse/BEAM-10940>>>) for
     >                 both batch
     >                  >     and streaming. I summarized the problem
    that we
     >                 want to solve and
     >                  >     proposed 2 potential approaches in this doc
     >                  >

> <https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing <https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing> <https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing <https://docs.google.com/document/d/1372B7HYxtcUYjZOnOM7OBTfSJ4CyFg_gaPD_NUxWClo/edit?usp=sharing>>>.

     >                  >
     >                  >     I want to collect feedback on which
    approach is
     >                 preferred and
     >                  >     anything that I have not taken into
    consideration
     >                 yet but I should.
     >                  >     Many thanks to all your help!
     >                  >
     >                  >     Boyuan
     >                  >
     >

Re: Self-checkpoint Support on Portable Flink

Reply via email to