Hi Kostas

Thanks for bringing this up. Currently, there are indeed some overlaps
between checkpoint and savepoint that will make user confused. I think the
FLIP's proposal can give users a clearer description.

About the FLIP, I have a question about  “Deleting or moving a snapshot
must be done by Flink", seems like we will support MOVE/DELETE the stopped
job's snapshot.   What should the user do when he/she wants to DELETE/MOVE
a stopped job's snapshot

Best,
Congxian


Becket Qin <becket....@gmail.com> 于2019年7月10日周三 上午9:33写道:

> Hi Kostas,
>
> It makes a lot of sense to just have one underlying mechanism (snapshot) to
> save the state of a Flink job. And we can use that mechanism in different
> scenarios, including checkpoint and user-triggered savepoint.
>
> To facilitate the discussion, maybe it is useful to clarify a few design
> goals, for example:
>
> 1. one unified snapshot format that supports
>      - both incremental and global state saving
>      - rescaling on recovery
>      - compatibility check / migration across different Flink versions?
> 2. The snapshot can easily be managed by users.
>
>
> And I have two questions regarding the FLIP.
>
> 1. What are the side-effects when taking a snapshot? Do you mean taking
> snapshot may triggers some action other than saving the state of the Job.
> Technically speaking, taking snapshot should be a "read-only" action to the
> Flink jobs. So I assume by side-effects, you meant it's no-longer
> read-only. If so, can you be more specific on what are the side-effects you
> are referring to?
>
> 2. In the rejected alternative, you mentioned a scenario of AB testing. It
> seems that if execution A and execution B runs different configurations
> after the savepoints, the history of the two jobs will always be different
> after that, right?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Jul 8, 2019 at 9:53 PM Kostas Kloudas <kklou...@gmail.com> wrote:
>
> > Hi Devs,
> >
> > Currently there is a number of efforts around checkpoints/savepoints, as
> > reflected by the number of FLIPs. From a quick look FLIP-34, FLIP-41,
> > FLIP-43, and FLIP-45 are all directly related to these topics. This
> > reflects the importance of these two notions/features to the users of the
> > framework.
> >
> > Although many efforts are centred around these notions, their semantics
> and
> > the interplay between them is not always clearly defined. This makes them
> > difficult to explain them to the users (all the different combinations of
> > state-backends, formats and tradeoffs) and in some cases it may have
> > negative effects to the users (e.g. the already-fixed-some-time-ago issue
> > of savepoints not being considered for recovery although they committed
> > side-effects).
> >
> > FLIP-47 [1] and the related Document [2] is aiming at starting a
> discussion
> > around the semantics of savepoints/checkpoints and their interplay, and
> to
> > some extent help us fix the future steps concerning these notions. As an
> > example, should we work towards bringing them closer, or moving them
> > further apart.
> >
> > This is not a complete proposal (by no means), as many of the practical
> > implications can only be fleshed out after we agree on the basic
> semantics
> > and the general frame around these notions. To that end, there are no
> > concrete implementation steps and the FLIP is going to be updated as the
> > discussion continues.
> >
> > I am really looking forward to your opinions on the topic.
> >
> > Cheers,
> > Kostas
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints
> > [2]
> >
> >
> https://docs.google.com/document/d/1_1FF8D3u0tT_zHWtB-hUKCP_arVsxlmjwmJ-TvZd4fs/edit?usp=sharing
> >
>

Reply via email to