Hi Kostas Thanks for bringing this up. Currently, there are indeed some overlaps between checkpoint and savepoint that will make user confused. I think the FLIP's proposal can give users a clearer description.
About the FLIP, I have a question about “Deleting or moving a snapshot must be done by Flink", seems like we will support MOVE/DELETE the stopped job's snapshot. What should the user do when he/she wants to DELETE/MOVE a stopped job's snapshot Best, Congxian Becket Qin <becket....@gmail.com> 于2019年7月10日周三 上午9:33写道: > Hi Kostas, > > It makes a lot of sense to just have one underlying mechanism (snapshot) to > save the state of a Flink job. And we can use that mechanism in different > scenarios, including checkpoint and user-triggered savepoint. > > To facilitate the discussion, maybe it is useful to clarify a few design > goals, for example: > > 1. one unified snapshot format that supports > - both incremental and global state saving > - rescaling on recovery > - compatibility check / migration across different Flink versions? > 2. The snapshot can easily be managed by users. > > > And I have two questions regarding the FLIP. > > 1. What are the side-effects when taking a snapshot? Do you mean taking > snapshot may triggers some action other than saving the state of the Job. > Technically speaking, taking snapshot should be a "read-only" action to the > Flink jobs. So I assume by side-effects, you meant it's no-longer > read-only. If so, can you be more specific on what are the side-effects you > are referring to? > > 2. In the rejected alternative, you mentioned a scenario of AB testing. It > seems that if execution A and execution B runs different configurations > after the savepoints, the history of the two jobs will always be different > after that, right? > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Jul 8, 2019 at 9:53 PM Kostas Kloudas <kklou...@gmail.com> wrote: > > > Hi Devs, > > > > Currently there is a number of efforts around checkpoints/savepoints, as > > reflected by the number of FLIPs. From a quick look FLIP-34, FLIP-41, > > FLIP-43, and FLIP-45 are all directly related to these topics. This > > reflects the importance of these two notions/features to the users of the > > framework. > > > > Although many efforts are centred around these notions, their semantics > and > > the interplay between them is not always clearly defined. This makes them > > difficult to explain them to the users (all the different combinations of > > state-backends, formats and tradeoffs) and in some cases it may have > > negative effects to the users (e.g. the already-fixed-some-time-ago issue > > of savepoints not being considered for recovery although they committed > > side-effects). > > > > FLIP-47 [1] and the related Document [2] is aiming at starting a > discussion > > around the semantics of savepoints/checkpoints and their interplay, and > to > > some extent help us fix the future steps concerning these notions. As an > > example, should we work towards bringing them closer, or moving them > > further apart. > > > > This is not a complete proposal (by no means), as many of the practical > > implications can only be fleshed out after we agree on the basic > semantics > > and the general frame around these notions. To that end, there are no > > concrete implementation steps and the FLIP is going to be updated as the > > discussion continues. > > > > I am really looking forward to your opinions on the topic. > > > > Cheers, > > Kostas > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints > > [2] > > > > > https://docs.google.com/document/d/1_1FF8D3u0tT_zHWtB-hUKCP_arVsxlmjwmJ-TvZd4fs/edit?usp=sharing > > >