Re: [DISCUSS] FLIP-203: Incremental savepoints

Piotr Nowojski Tue, 18 Jan 2022 06:52:32 -0800

Hi All,

Yu, sorry I was somehow confused about the configuration. I've changed
the FLIP as you sugested.


Yu/Yun Tang:

Regarding the RocksDB incompatibility, I think we can claim that Flink
version upgrades are/will be supported. If ever we need to break the
backward compatibility via bumping RocksDB version in a such way, that
RocksDB won't be able to provide that compatibility, we will need to make
this a prominent notice in the release notes.

> I have used the State Processor API with aligned, full checkpoints. There
it has worked just fine.

Thanks for this information.

> 0) What exactly does the "State Processor API" row mean? Is it: Can it be
> read by the State Processor API? Can it be written by the State Processor
> API? Both? Something else?

Good question. I'm not sure how State Processor API is working. Can someone
help answer what we guarantee/support right now and what we can
reasonably support?

> 1) and 2)

I guess you are simply in favour of the 2nd proposal? So

* rescaling
* Job upgrade w/o changing graph shape and record types
* Flink bug/patch (1.14.x → 1.14.y) version upgrade

+ Flink minor (1.x → 1.y) version upgrade

which I think is important for the native savepoint to be truly savepoints.

> 3) Should "Job upgrade w/o changing graph shape and record types" be
split? I guess "record types" is only relevant for unaligned checkpoints.

Shape of the job graph is also an issue with unaligned checkpoints.
Changing record types/serialisation causes obvious problems with the
in-flight records, but if you change job graph via for example changing
type of the network connection (like random -> broadcast, keyed -> non
keyed), or remove some operators, we also have problems with the in-flight
records in the affected connections.

> 4)

I think configuration change should be always supported. If you think
that's important, I can add this to the documentation/FLIP proposal as a
separate row.

> 5) Do the guarantees that a Savepoint/Checkpoint provide change when
> generalized incremental checkpoints [1] are enabled? My understanding is:
> No, the same guarantees apply.

This will be more tricky and it will highly depend on the FLIP-158
implementation.

Yun:

>   1.  From my understanding, native savepoint appears much closer to
current alignment checkpoint. What's their difference?

Technically there would be no difference, but we might decide to limit what
we officially support, to allow us easier changes in the future. Just as
for the most part so far between savepoint and checkpoints there was very
little difference. For us, as developers, the fewer things we officially
support and claim are stable, the better.

> 2.  If self-contained and relocatable are the most important difference,
why not include them in the proposal table?

Good point. I will add this.

>  What does "Job full upgrade" means?

I have clarified it to:
> Arbitrary job upgrade (changed graph shape/record types)

It's an arbitrary job change. Anything that doesn't fall into the second
category "Job upgrade w/o changing graph shape and record types"

Best,
Piotrek

pon., 17 sty 2022 o 11:25 Yun Tang <[email protected]> napisał(a):

> Hi everyone,
>
> Thanks for Piotr to drive this topic.
>
> I have several questions on this FLIP.
>
>   1.  From my understanding, native savepoint appears much closer to
> current alignment checkpoint. What's their difference?
>   2.  If self-contained and relocatable are the most important difference,
> why not include them in the proposal table?
>
>   1.  What does "Job full upgrade" means?
>
> For the question of RocksDB upgrading, this depends on the backwards
> compatibility [1], and it proves to be very well as the documentation said.
>
>
> [1]
> https://github.com/facebook/rocksdb/wiki/RocksDB-Compatibility-Between-Different-Releases
>
> Best，
> Yun Tang
>
>
>
> ________________________________
> From: Konstantin Knauf <[email protected]>
> Sent: Friday, January 14, 2022 20:39
> To: dev <[email protected]>; Seth Wiesman <[email protected]>; Nico
> Kruber <[email protected]>; [email protected] <[email protected]>
> Subject: Re: [DISCUSS] FLIP-203: Incremental savepoints
>
> Hi everyone,
>
> Thank you, Piotr. Please find my thoughts on the topic below:
>
> 0) What exactly does the "State Processor API" row mean? Is it: Can it be
> read by the State Processor API? Can it be written by the State Processor
> API? Both? Something else?
>
> 1) If we take the assumption from FLIP-193 "that ownership should be the
> only difference between Checkpoints and Savepoints.", we would need to work
> in the direction of "Proposal 2". The distinction would then be the
> following:
>
> * Canonical Savepoint = Guarantees A
> * Canonical Checkpoint = Guarantees A (in theory; does not exist)
> * Aligned, Native Checkpoint = Guarantees B
> * Aligned, Native Savepoint = Guarantees B
> * Unaligned, Native Checkpoint = Guarantees C
> * Unaligned, Native Savepoint = Guarantees C (if this would exist in the
> future)
>
> I think it is important to make this matrix not too complicated like: there
> are 8 different sets of guarantees depending on all kinds of more or less
> well-known configuration options.
>
> 2) With respect to the concrete guarantees, I believe, it's important that
> we can cover all important use cases in "green", so that users can rely on
> official, tested behavior in regular operations. In my experience this
> includes manual recovery of a Job from a retained checkpoint. I would argue
> that most users operating a long-running, stateful Apache Flink application
> have been in the situation, where a graceful "stop" was not possible
> anymore, because the Job was unable to take a Savepoint. This could be,
> because the Job is frequently restarting (e.g. poison pill) or because it
> fails on taking the Savepoint itself for some reason (e.g. unable to commit
> a transaction to an external system). The solution strategy in this
> scenario is to cancel the job, make some changes to the Job or
> configuration that fix the problem and restore from the last successful
> (retained) checkpoint. I think the following changes would need to be
> officially supported for Native Checkpoints/Savepoint (Guarantees B,
> ideally also Guarantees C), in order to fix a Job in most of these cases.
>
> * rescaling
> * Job upgrade w/o changing graph shape and record types
> * Flink bug/patch (1.14.x → 1.14.y) version upgrade
>
> I would be very interested to hear from users as well as people like Seth,
> Nico or David (cc), who work with many users, what  in their experience
> would be needed here.
>
> 3) Should "Job upgrade w/o changing graph shape and record types" be split?
> I guess "record types" is only relevant for unaligned checkpoints.
>
> 4) Does it make sense to consider Flink configuration changes besides the
> statebackend type as another row? Maybe split by "pipeline.*" options,
> "execution.*" options, and whichever other categories would make sense.
> Just to give a few examples: it should be *officially* supported to take a
> native retained checkpoint and restart a the Job with a
> pipeline.auto-watermark-interval and different high-availability
> configurations
>
> 5) Do the guarantees that a Savepoint/Checkpoint provide change when
> generalized incremental checkpoints [1] are enabled? My understanding is:
> No, the same guarantees apply.
>
> Cheers and thank you,
>
> Konstantin
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints?src=contextnavpagetreemode
>
> On Fri, Jan 14, 2022 at 11:24 AM David Anderson <[email protected]>
> wrote:
>
> > > I have a very similar question to State Processor API. Is it the same
> > scenario in this case?
> > > Should it also be working with checkpoints but might be just untested?
> >
> > I have used the State Processor API with aligned, full checkpoints. There
> > it has worked just fine.
> >
> > David
> >
> > On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > Thanks for the comments and questions. Starting from the top:
> > >
> > > Seth: good point about schema evolution. Actually, I have a very
> similar
> > > question to State Processor API. Is it the same scenario in this case?
> > > Should it also be working with checkpoints but might be just untested?
> > >
> > > And next question, should we commit to supporting those two things
> (State
> > > Processor API and schema evolution) for native savepoints? What about
> > > aligned checkpoints? (please check [1] for that).
> > >
> > > Yu Li: 1, 2 and 4 done.
> > >
> > > > 3. How about changing the description of "the default configuration
> of
> > > the
> > > > checkpoints will be used to determine whether the savepoint should be
> > > > incremental or not" to something like "the
> `state.backend.incremental`
> > > > setting now denotes the type of native format snapshot and will take
> > > effect
> > > > for both checkpoint and savepoint (with native type)", to prevent
> > concept
> > > > confusion between checkpoint and savepoint?
> > >
> > > Is `state.backend.incremental` the only configuration parameter that
> can
> > be
> > > used in this context? I would guess not? What about for example
> > > "state.storage.fs.memory-threshold" or all of the Advanced RocksDB
> State
> > > Backends Options [2]?
> > >
> > > David:
> > >
> > > > does this mean that we need to keep the checkpoints compatible across
> > > minor
> > > > versions? Or can we say, that the minor version upgrades are only
> > > > guaranteed with canonical savepoints?
> > >
> > > Good question. Frankly I was always assuming that this is implicitly
> > given.
> > > Otherwise users would not be able to recover jobs that are failing
> > because
> > > of bugs in Flink. But I'm pretty sure that was never explicitly stated.
> > >
> > > As Konstantin suggested, I've written down the pre-existing guarantees
> of
> > > checkpoints and savepoints followed by two proposals on how they should
> > be
> > > changed [1]. Could you take a look?
> > >
> > > I'm especially unsure about the following things:
> > > a) What about RocksDB upgrades? If we bump RocksDB version between
> Flink
> > > versions, do we support recovering from a native format snapshot
> > > (incremental checkpoint)?
> > > b) State Processor API - both pre-existing and what do we want to
> provide
> > > in the future
> > > c) Schema Evolution - both pre-existing and what do we want to provide
> in
> > > the future
> > >
> > > Best,
> > > Piotrek
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
> > > [2]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options
> > >
> > > wt., 11 sty 2022 o 09:45 Konstantin Knauf <[email protected]>
> > napisał(a):
> > >
> > > > Hi Piotr,
> > > >
> > > > would it be possible to provide a table that shows the
> > > > compatibility guarantees provided by the different snapshots going
> > > forward?
> > > > Like type of change (Topology. State Schema, Parallelism, ..) in one
> > > > dimension, and type of snapshot as the other dimension. Based on
> that,
> > it
> > > > would be easier to discuss those guarantees, I believe.
> > > >
> > > > Cheers,
> > > >
> > > > Konstantin
> > > >
> > > > On Mon, Jan 3, 2022 at 9:11 AM David Morávek <[email protected]>
> wrote:
> > > >
> > > > > Hi Piotr,
> > > > >
> > > > > does this mean that we need to keep the checkpoints compatible
> across
> > > > minor
> > > > > versions? Or can we say, that the minor version upgrades are only
> > > > > guaranteed with canonical savepoints?
> > > > >
> > > > > My concern is especially if we'd want to change layout of the
> > > checkpoint.
> > > > >
> > > > > D.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Dec 29, 2021 at 5:19 AM Yu Li <[email protected]> wrote:
> > > > >
> > > > > > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and
> > below
> > > > are
> > > > > > my two cents:
> > > > > >
> > > > > > 1. How about adding a "Term Definition" section and clarify what
> > > > "native
> > > > > > format" (the "native" data persistence format of the current
> state
> > > > > backend)
> > > > > > and "canonical format" (the "uniform" format that supports
> > switching
> > > > > state
> > > > > > backends) means?
> > > > > >
> > > > > > 2. IIUC, currently the FLIP proposes to only support incremental
> > > > > savepoint
> > > > > > with native format, and there's no plan to add such support for
> > > > canonical
> > > > > > format, right? If so, how about writing this down explicitly in
> the
> > > > FLIP
> > > > > > doc, maybe in a "Limitations" section, plus the fact that
> > > > > > `HashMapStateBackend` cannot support incremental savepoint before
> > > > > FLIP-151
> > > > > > is done? (side note: @Roman just a kindly reminder, that please
> > take
> > > > > > FLIP-203 into account when implementing FLIP-151)
> > > > > >
> > > > > > 3. How about changing the description of "the default
> configuration
> > > of
> > > > > the
> > > > > > checkpoints will be used to determine whether the savepoint
> should
> > be
> > > > > > incremental or not" to something like "the
> > > `state.backend.incremental`
> > > > > > setting now denotes the type of native format snapshot and will
> > take
> > > > > effect
> > > > > > for both checkpoint and savepoint (with native type)", to prevent
> > > > concept
> > > > > > confusion between checkpoint and savepoint?
> > > > > >
> > > > > > 4. How about putting the notes of behavior change (the default
> type
> > > of
> > > > > > savepoint will be changed to `native` in the future, and by then
> > the
> > > > > taken
> > > > > > savepoint cannot be used to switch state backends by default) to
> a
> > > more
> > > > > > obvious place, for example moving from the "CLI" section to the
> > > > > > "Compatibility" section? (although it will only happen in 1.16
> > > release
> > > > > > based on the proposed plan)
> > > > > >
> > > > > > And all above suggestions apply for our user-facing document
> after
> > > the
> > > > > FLIP
> > > > > > is (partially or completely, accordingly) done, if taken (smile).
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Tue, 21 Dec 2021 at 22:23, Seth Wiesman <[email protected]>
> > > > wrote:
> > > > > >
> > > > > > > >> AFAIK state schema evolution should work both for native and
> > > > > canonical
> > > > > > > >> savepoints.
> > > > > > >
> > > > > > > Schema evolution does technically work for both formats, it
> > happens
> > > > > after
> > > > > > > the code paths have been unified, but the community has up
> until
> > > this
> > > > > > point
> > > > > > > considered that an unsupported feature. From my perspective
> > making
> > > > this
> > > > > > > supported could be as simple as adding test coverage but that's
> > an
> > > > > active
> > > > > > > decision we'd need to make.
> > > > > > >
> > > > > > > On Tue, Dec 21, 2021 at 7:43 AM Piotr Nowojski <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > > In this context: will the native format support state
> schema
> > > > > > evolution?
> > > > > > > > If
> > > > > > > > > not, I am not sure, we can let the format default to
> native.
> > > > > > > >
> > > > > > > > AFAIK state schema evolution should work both for native and
> > > > > canonical
> > > > > > > > savepoints.
> > > > > > > >
> > > > > > > > Regarding what is/will be supported we will document as part
> of
> > > > this
> > > > > > > > FLIP-203. But it's not as simple as just the difference
> between
> > > > > native
> > > > > > > and
> > > > > > > > canonical formats.
> > > > > > > >
> > > > > > > > Best, Piotrek
> > > > > > > >
> > > > > > > > pon., 20 gru 2021 o 14:28 Konstantin Knauf <
> [email protected]>
> > > > > > > napisał(a):
> > > > > > > >
> > > > > > > > > Hi Piotr,
> > > > > > > > >
> > > > > > > > > Thanks a lot for starting the discussion. Big +1.
> > > > > > > > >
> > > > > > > > > In my understanding, this FLIP introduces the snapshot
> format
> > > as
> > > > a
> > > > > > > > *really*
> > > > > > > > > user facing concept. IMO it is important that we document
> > > > > > > > >
> > > > > > > > > a) that it is not longer the checkpoint/savepoint
> > > characteristics
> > > > > > that
> > > > > > > > > determines the kind of changes that a snapshots allows
> (user
> > > > code,
> > > > > > > state
> > > > > > > > > schema evolution, topology changes), but now this becomes a
> > > > > property
> > > > > > of
> > > > > > > > the
> > > > > > > > > format regardless of whether this is a snapshots or a
> > > checkpoint
> > > > > > > > > b) the exact changes that each format allows (code, state
> > > schema,
> > > > > > > > topology,
> > > > > > > > > state backend, max parallelism)
> > > > > > > > >
> > > > > > > > > In this context: will the native format support state
> schema
> > > > > > evolution?
> > > > > > > > If
> > > > > > > > > not, I am not sure, we can let the format default to
> native.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Konstantin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 20, 2021 at 2:09 PM Piotr Nowojski <
> > > > > [email protected]
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi devs,
> > > > > > > > > >
> > > > > > > > > > I would like to start a discussion about a previously
> > > announced
> > > > > > > follow
> > > > > > > > up
> > > > > > > > > > of the FLIP-193 [1], namely allowing savepoints to be in
> > > native
> > > > > > > format
> > > > > > > > > and
> > > > > > > > > > incremental. The changes do not seem invasive. The full
> > > > proposal
> > > > > is
> > > > > > > > > > written down as FLIP-203: Incremental savepoints [2].
> > Please
> > > > > take a
> > > > > > > > look,
> > > > > > > > > > and let me know what you think.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Piotrek
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > > > > > > > > [2]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Konstantin Knauf
> > > > > > > > >
> > > > > > > > > https://twitter.com/snntrable
> > > > > > > > >
> > > > > > > > > https://github.com/knaufk
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Konstantin Knauf
> > > >
> > > > https://twitter.com/snntrable
> > > >
> > > > https://github.com/knaufk
> > > >
> > >
> >
>
>
> --
>
> Konstantin Knauf | Head of Product
>
> +49 160 91394525
>
>
> Follow us @VervericaData Ververica <https://www.ververica.com/>
>
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> Jinwei (Kevin) Zhang
>

Re: [DISCUSS] FLIP-203: Incremental savepoints

Reply via email to