Thanks zelin for bringing up the discussion. I'm thinking about: 1. How to manage the savepoints if there are no expiration mechanism, by the TTL management of storages or external script? 2. I think the id of compacted snapshot picked by the savepoint and manifest file list is also important information for users, could these information be stored in the system-table?
Best, Guojun On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> wrote: > FYI > > The PIP lacks a table to show Discussion thread & Vote thread & ISSUE... > > Best > Jingsong > > On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> wrote: > > > > Hi, all, > > > > Thank all of you for your suggestions and questions. After reading your > suggestions, I adopt some of them and I want to share my opinions here. > > > > To make my statements more clear, I will still use the word `savepoint`. > When we make a consensus, the name may be changed. > > > > 1. The purposes of savepoint > > > > As Shammon mentioned, Flink and database also have the concept of > `savepoint`. So it’s better to clarify the purposes of our savepoint. > Thanks for Nicholas and Jingsong, I think your explanations are very clear. > I’d like to give my summary: > > > > (1) Fault recovery (or we can say disaster recovery). Users can ROLL > BACK to a savepoint if needed. If user rollbacks to a savepoint, the table > will hold the data in the savepoint and the data committed after the > savepoint will be deleted. In this scenario we need savepoint because > snapshots may have expired, the savepoint can keep longer and save user’s > old data. > > > > (2) Record versions of data at a longer interval (typically daily level > or weekly level). With savepoint, user can query the old data in batch > mode. Comparing to copy records to a new table or merge incremental records > with old records (like using merge into in Hive), the savepoint is more > lightweight because we don’t copy data files, we just record the meta data > of them. > > > > As you can see, savepoint is very similar to snapshot. The differences > are: > > > > (1) Savepoint lives longer. In most cases, snapshot’s life time is > about several minutes to hours. We suppose the savepoint can live several > days, weeks, or even months. > > > > (2) Savepoint is mainly used for batch reading for historical data. In > this PIP, we don’t introduce streaming reading for savepoint. > > > > 2. Candidates of name > > > > I agree with Jingsong that we can use a new name. Since the purpose and > mechanism (savepoint is very similar to snapshot) of savepoint is similar > to `tag` in iceberg, maybe we can use `tag`. > > > > In my opinion, an alternative is `anchor`. All the snapshots are like > the navigation path of the streaming data, and an `anchor` can stop it in a > place. > > > > 3. Public table operations and options > > > > We supposed to expose some operations and table options for user to > manage the savepoint. > > > > (1) Operations (Currently for Flink) > > We provide flink actions to manage savepoints: > > create-savepoint: To generate a savepoint from latest snapshot. > Support to create from specified snapshot. > > delete-savepoint: To delete specified savepoint. > > rollback-to: To roll back to a specified savepoint. > > > > (2) Table options > > We suppose to provide options for creating savepoint periodically: > > savepoint.create-time: When to create the savepoint. Example: 00:00 > > savepoint.create-interval: Interval between the creation of two > savepoints. Examples: 2 d. > > savepoint.time-retained: The maximum time of savepoints to retain. > > > > (3) Procedures (future work) > > Spark supports SQL extension. After we support Spark CALL statement, we > can provide procedures to create, delete or rollback to savepoint for Spark > users. > > > > Support of CALL is on the road map of Flink. In future version, we can > also support savepoint-related procedures for Flink users. > > > > 4. Expiration of data files > > > > Currently, when a snapshot is expired, data files that not be used by > other snapshots. After we introduce the savepoint, we must make sure the > data files saved by savepoint will not be deleted. > > > > Conversely, when a savepoint is deleted, the data files that not be > used by existing snapshots and other savepoints will be deleted. > > > > I have wrote some POC codes to implement it. I will update the mechanism > in PIP soon. > > > > Best, > > Yu Zelin > > > > > 2023年5月21日 20:54,Jingsong Li <[email protected]> 写道: > > > > > > Thanks Yun for your information. > > > > > > We need to be careful to avoid confusion between Paimon and Flink > > > concepts about "savepoint" > > > > > > Maybe we don't have to insist on using this "savepoint", for example, > > > TAG is also a candidate just like Iceberg [1] > > > > > > [1] https://iceberg.apache.org/docs/latest/branching/ > > > > > > Best, > > > Jingsong > > > > > > On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]> > wrote: > > >> > > >> Thanks Nicholas for your detailed requirements. > > >> > > >> We need to supplement user requirements in FLIP, which is mainly aimed > > >> at two purposes: > > >> 1. Fault recovery for data errors (named: restore or rollback-to) > > >> 2. Used to record versions at the day level (such as), targeting > batch queries > > >> > > >> Best, > > >> Jingsong > > >> > > >> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote: > > >>> > > >>> Hi Guys, > > >>> > > >>> Since we use Paimon with Flink in most cases, I think we need to > identify the same word "savepoint" in different systems. > > >>> > > >>> For Flink, savepoint means: > > >>> > > >>> 1. Triggered by users, not periodically triggered by the system > itself. However, this FLIP wants to support it created periodically. > > >>> 2. Even the so-called incremental native savepoint [1], it will > not depend on the previous checkpoints or savepoints, it will still copy > files on DFS to the self-contained savepoint folder. However, from the > description of this FLIP about the deletion of expired snapshot files, > paimion savepoint will refer to the previously existing files directly. > > >>> > > >>> I don't think we need to make the semantics of Paimon totally the > same as Flink's. However, we need to introduce a table to tell the > difference compared with Flink and discuss about the difference. > > >>> > > >>> [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic > > >>> > > >>> Best > > >>> Yun Tang > > >>> ________________________________ > > >>> From: Nicholas Jiang <[email protected]> > > >>> Sent: Friday, May 19, 2023 17:40 > > >>> To: [email protected] <[email protected]> > > >>> Subject: Re: [DISCUSS] PIP-4 Support savepoint > > >>> > > >>> Hi Guys, > > >>> > > >>> Thanks Zelin for driving the savepoint proposal. I propose some > opinions for savepont: > > >>> > > >>> -- About "introduce savepoint for Paimon to persist full data in a > time point" > > >>> > > >>> The motivation of savepoint proposal is more like snapshot TTL > management. Actually, disaster recovery is very much mission critical for > any software. Especially when it comes to data systems, the impact could be > very serious leading to delay in business decisions or even wrong business > decisions at times. Savepoint is proposed to assist users in recovering > data from a previous state: "savepoint" and "restore". > > >>> > > >>> "savepoint" saves the Paimon table as of the commit time, therefore > if there is a savepoint, the data generated in the corresponding commit > could not be clean. Meanwhile, savepoint could let user restore the table > to this savepoint at a later point in time if need be. On similar lines, > savepoint cannot be triggered on a commit that is already cleaned up. > Savepoint is synonymous to taking a backup, just that we don't make a new > copy of the table, but just save the state of the table elegantly so that > we can restore it later when in need. > > >>> > > >>> "restore" lets you restore your table to one of the savepoint > commit. Meanwhile, it cannot be undone (or reversed) and so care should be > taken before doing a restore. At this time, Paimon would delete all data > files and commit files (timeline files) greater than the savepoint commit > to which the table is being restored. > > >>> > > >>> BTW, it's better to introduce snapshot view based on savepoint, > which could improve query performance of historical data for Paimon table. > > >>> > > >>> -- About Public API of savepont > > >>> > > >>> Current introduced savepoint interfaces in Public API are not enough > for users, for example, deleteSavepoint, restoreSavepoint etc. > > >>> > > >>> -- About "Paimon's savepoint need to be combined with Flink's > savepoint": > > >>> > > >>> If paimon supports savepoint mechanism and provides savepoint > interfaces, the integration with Flink's savepoint is not blocked for this > proposal. > > >>> > > >>> In summary, savepoint is not only used to improve the query > performance of historical data, but also used for disaster recovery > processing. > > >>> > > >>> On 2023/05/17 09:53:11 Jingsong Li wrote: > > >>>> What Shammon mentioned is interesting. I agree with what he said > about > > >>>> the differences in savepoints between databases and stream > computing. > > >>>> > > >>>> About "Paimon's savepoint need to be combined with Flink's > savepoint": > > >>>> > > >>>> I think it is possible, but we may need to deal with this in another > > >>>> mechanism, because the snapshots after savepoint may expire. We need > > >>>> to compare data between two savepoints to generate incremental data > > >>>> for streaming read. > > >>>> > > >>>> But this may not need to block FLIP, it looks like the current > design > > >>>> does not break the future combination? > > >>>> > > >>>> Best, > > >>>> Jingsong > > >>>> > > >>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]> > wrote: > > >>>>> > > >>>>> Hi Caizhi, > > >>>>> > > >>>>> Thanks for your comments. As you mentioned, I think we may need to > discuss > > >>>>> the role of savepoint in Paimon. > > >>>>> > > >>>>> If I understand correctly, the main feature of savepoint in the > current PIP > > >>>>> is that the savepoint will not be expired, and users can perform a > query on > > >>>>> the savepoint according to time-travel. Besides that, there is > savepoint in > > >>>>> the database and Flink. > > >>>>> > > >>>>> 1. Savepoint in database. The database can roll back table data to > the > > >>>>> specified 'version' based on savepoint. So the key point of > savepoint in > > >>>>> the database is to rollback data. > > >>>>> > > >>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a > specific > > >>>>> 'path', and save all data of state to the savepoint for job. Then > users can > > >>>>> create a new job based on the savepoint to continue consuming > incremental > > >>>>> data. I think the core capabilities are: backup for a job, and > resume a job > > >>>>> based on the savepoint. > > >>>>> > > >>>>> In addition to the above, Paimon may also face data write > corruption and > > >>>>> need to recover data based on the specified savepoint. So we may > need to > > >>>>> consider what abilities should Paimon savepoint need besides the > ones > > >>>>> mentioned in the current PIP? > > >>>>> > > >>>>> Additionally, as mentioned above, Flink also has > > >>>>> savepoint mechanism. During the process of streaming data from > Flink to > > >>>>> Paimon, does Paimon's savepoint need to be combined with Flink's > savepoint? > > >>>>> > > >>>>> > > >>>>> Best, > > >>>>> Shammon FY > > >>>>> > > >>>>> > > >>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]> > wrote: > > >>>>> > > >>>>>> Hi developers! > > >>>>>> > > >>>>>> Thanks Zelin for bringing up the discussion. The proposal seems > good to me > > >>>>>> overall. However I'd also like to bring up a few options. > > >>>>>> > > >>>>>> 1. As Jingsong mentioned, Savepoint class should not become a > public API, > > >>>>>> at least for now. What we need to discuss for the public API is > how the > > >>>>>> users can create or delete savepoints. For example, what the > table option > > >>>>>> looks like, what commands and options are provided for the Flink > action, > > >>>>>> etc. > > >>>>>> > > >>>>>> 2. Currently most Flink actions are related to streaming > processing, so > > >>>>>> only Flink can support them. However, savepoint creation and > deletion seems > > >>>>>> like a feature for batch processing. So aside from Flink actions, > shall we > > >>>>>> also provide something like Spark actions for savepoints? > > >>>>>> > > >>>>>> I would also like to comment on Shammon's views. > > >>>>>> > > >>>>>> Should we introduce an option for savepoint path which may be > different > > >>>>>>> from 'warehouse'? Then users can backup the data of savepoint. > > >>>>>>> > > >>>>>> > > >>>>>> I don't see this is necessary. To backup a table the user just > need to copy > > >>>>>> all files from the table directory. Savepoint in Paimon, as far > as I > > >>>>>> understand, is mainly for users to review historical data, not > for backing > > >>>>>> up tables. > > >>>>>> > > >>>>>> Will the savepoint copy data files from snapshot or only save > meta files? > > >>>>>>> > > >>>>>> > > >>>>>> It would be a heavy burden if a savepoint copies all its files. > As I > > >>>>>> mentioned above, savepoint is not for backing up tables. > > >>>>>> > > >>>>>> How can users create a new table and restore data from the > specified > > >>>>>>> savepoint? > > >>>>>> > > >>>>>> > > >>>>>> This reminds me of savepoints in Flink. Still, savepoint is not > for backing > > >>>>>> up tables so I guess we don't need to support "restoring data" > from a > > >>>>>> savepoint. > > >>>>>> > > >>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道: > > >>>>>> > > >>>>>>> Thanks Zelin for initiating this discussion. I have some > comments: > > >>>>>>> > > >>>>>>> 1. Should we introduce an option for savepoint path which may be > > >>>>>> different > > >>>>>>> from 'warehouse'? Then users can backup the data of savepoint. > > >>>>>>> > > >>>>>>> 2. Will the savepoint copy data files from snapshot or only save > meta > > >>>>>>> files? The description in the PIP "After we introduce savepoint, > we > > >>>>>> should > > >>>>>>> also check if the data files are used by savepoints." looks like > we only > > >>>>>>> save meta files for savepoint. > > >>>>>>> > > >>>>>>> 3. How can users create a new table and restore data from the > specified > > >>>>>>> savepoint? > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Shammon FY > > >>>>>>> > > >>>>>>> > > >>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li < > [email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Thanks Zelin for driving. > > >>>>>>>> > > >>>>>>>> Some comments: > > >>>>>>>> > > >>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the > top, > > >>>>>>>> Public API has no meaning if I don't know how to do it. > > >>>>>>>> > > >>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public > API, only > > >>>>>>>> Flink action or configuration option should be public API. > > >>>>>>>> > > >>>>>>>> 3.Maybe we can have a separate chapter to describe > > >>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It > is not > > >>>>>>>> just an interval, because the true user case is savepoint after > 0:00. > > >>>>>>>> > > >>>>>>>> 4.About 'Interaction with Snapshot', to be continued ... > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Jingsong > > >>>>>>>> > > >>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected] > > > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Hi, Paimon Devs, > > >>>>>>>>> I’d like to start a discussion about PIP-4[1]. In this > PIP, I > > >>>>>> want > > >>>>>>>> to talk about why we need savepoint, and some thoughts about > managing > > >>>>>> and > > >>>>>>>> using savepoint. Look forward to your question and suggestions. > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Yu Zelin > > >>>>>>>>> > > >>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >
