Hi, Jingsong, Thanks for your feedback.
## TAG ID It seems the id is useless currently. I’ll remove it. ## Time Travel Syntax Since tag id is removed, we can just use: SELECT * FROM t VERSION AS OF ’tag-name’ to travel to a tag. ## Tag class I agree with you that we can reuse the Snapshot class. We can introduce `TagManager` only to manage tags. ## Expiring Snapshot > why not record it in ManifestEntry? This is because every time Paimon generate a snapshot, it will create new ManifestEntries for data files. Consider this scenario, if we record it in ManifestEntry, assuming we commit data file A to snapshot #1, we will get manifest entry Entry#1 as [ADD, A, commit at #1]. Then we commit -A to snapshot #2, we will get manifest entry Entry#2 as [DELETE, A, ?], as you can see, we cannot know at which snapshot we commit the file A. So we have to record this information to data file meta directly. > We should note that "record it in `DataFileMeta` should be done before “tag” and document version compatibility. I will add message for this. Best, Yu Zelin > 2023年5月29日 10:29,Jingsong Li <[email protected]> 写道: > > Thanks Zelin for the update. > > ## TAG ID > > Is this useful? We have tag-name, snapshot-id, and now introducing a > tag id? What is used? > > ## Time Travel > > SELECT * FROM t VERSION AS OF tag-name.<name> > > This does not look like sql standard. > > Why do we introduce this `tag-name` prefix? > > ## Tag class > > Why not just use the Snapshot class? It looks like we don't need to > introduce Tag class. We can just copy the snapshot file to tag/. > > ## Expiring Snapshot > > We should note that "record it in `DataFileMeta`" should be done > before "tag". And document version compatibility. > And why not record it in ManifestEntry? > > Best, > Jingsong > > On Fri, May 26, 2023 at 11:15 AM yu zelin <[email protected]> wrote: >> >> Hi, all, >> >> FYI, I have updated the PIP [1]. >> >> Main changes: >> - Use new name `tag` >> - Enrich Motivation >> - New Section `Data Files Handling` to describe how to determine a data >> files can be deleted. >> >> Best, >> Yu Zelin >> >> [1] https://cwiki.apache.org/confluence/x/NxE0Dw >> >>> 2023年5月24日 17:18,yu zelin <[email protected]> 写道: >>> >>> Hi, Guojun, >>> >>> I’d like to share my thoughts about your questions. >>> >>> 1. Expiration of savepoint >>> In my opinion, savepoints are created in a long interval, so there will not >>> exist too many of them. >>> If users create a savepoint per day, there are 365 savepoints a year. So I >>> didn’t consider expiration >>> of it, and I think provide a flink action like `delete-savepoint id = 1` is >>> enough now. >>> But if it is really important, we can introduce table options to do so. I >>> think we can do it like expiring >>> snapshots. >>> >>> 2. > id of compacted snapshot picked by the savepoint >>> My initial idea is picking a compacted snapshot or doing compaction before >>> creating savepoint. But >>> After discuss with Jingsong, I found it’s difficult. So now I suppose to >>> directly create savepoint from >>> the given snapshot. Maybe we can optimize it later. >>> The changes will be updated soon. >>>> manifest file list in system-table >>> I think manifest file is not very important for users. Users can find when >>> a savepoint is created, and >>> get the savepoint id, then they can query it from the savepoint by the id. >>> I did’t see what scenario >>> the users need the manifest file information. What do you think? >>> >>> Best, >>> Yu Zelin >>> >>>> 2023年5月24日 10:50,Guojun Li <[email protected]> 写道: >>>> >>>> Thanks zelin for bringing up the discussion. I'm thinking about: >>>> 1. How to manage the savepoints if there are no expiration mechanism, by >>>> the TTL management of storages or external script? >>>> 2. I think the id of compacted snapshot picked by the savepoint and >>>> manifest file list is also important information for users, could these >>>> information be stored in the system-table? >>>> >>>> Best, >>>> Guojun >>>> >>>> On Mon, May 22, 2023 at 9:13 PM Jingsong Li <[email protected]> wrote: >>>> >>>>> FYI >>>>> >>>>> The PIP lacks a table to show Discussion thread & Vote thread & ISSUE... >>>>> >>>>> Best >>>>> Jingsong >>>>> >>>>> On Mon, May 22, 2023 at 4:48 PM yu zelin <[email protected]> wrote: >>>>>> >>>>>> Hi, all, >>>>>> >>>>>> Thank all of you for your suggestions and questions. After reading your >>>>> suggestions, I adopt some of them and I want to share my opinions here. >>>>>> >>>>>> To make my statements more clear, I will still use the word `savepoint`. >>>>> When we make a consensus, the name may be changed. >>>>>> >>>>>> 1. The purposes of savepoint >>>>>> >>>>>> As Shammon mentioned, Flink and database also have the concept of >>>>> `savepoint`. So it’s better to clarify the purposes of our savepoint. >>>>> Thanks for Nicholas and Jingsong, I think your explanations are very >>>>> clear. >>>>> I’d like to give my summary: >>>>>> >>>>>> (1) Fault recovery (or we can say disaster recovery). Users can ROLL >>>>> BACK to a savepoint if needed. If user rollbacks to a savepoint, the table >>>>> will hold the data in the savepoint and the data committed after the >>>>> savepoint will be deleted. In this scenario we need savepoint because >>>>> snapshots may have expired, the savepoint can keep longer and save user’s >>>>> old data. >>>>>> >>>>>> (2) Record versions of data at a longer interval (typically daily level >>>>> or weekly level). With savepoint, user can query the old data in batch >>>>> mode. Comparing to copy records to a new table or merge incremental >>>>> records >>>>> with old records (like using merge into in Hive), the savepoint is more >>>>> lightweight because we don’t copy data files, we just record the meta data >>>>> of them. >>>>>> >>>>>> As you can see, savepoint is very similar to snapshot. The differences >>>>> are: >>>>>> >>>>>> (1) Savepoint lives longer. In most cases, snapshot’s life time is >>>>> about several minutes to hours. We suppose the savepoint can live several >>>>> days, weeks, or even months. >>>>>> >>>>>> (2) Savepoint is mainly used for batch reading for historical data. In >>>>> this PIP, we don’t introduce streaming reading for savepoint. >>>>>> >>>>>> 2. Candidates of name >>>>>> >>>>>> I agree with Jingsong that we can use a new name. Since the purpose and >>>>> mechanism (savepoint is very similar to snapshot) of savepoint is similar >>>>> to `tag` in iceberg, maybe we can use `tag`. >>>>>> >>>>>> In my opinion, an alternative is `anchor`. All the snapshots are like >>>>> the navigation path of the streaming data, and an `anchor` can stop it in >>>>> a >>>>> place. >>>>>> >>>>>> 3. Public table operations and options >>>>>> >>>>>> We supposed to expose some operations and table options for user to >>>>> manage the savepoint. >>>>>> >>>>>> (1) Operations (Currently for Flink) >>>>>> We provide flink actions to manage savepoints: >>>>>> create-savepoint: To generate a savepoint from latest snapshot. >>>>> Support to create from specified snapshot. >>>>>> delete-savepoint: To delete specified savepoint. >>>>>> rollback-to: To roll back to a specified savepoint. >>>>>> >>>>>> (2) Table options >>>>>> We suppose to provide options for creating savepoint periodically: >>>>>> savepoint.create-time: When to create the savepoint. Example: 00:00 >>>>>> savepoint.create-interval: Interval between the creation of two >>>>> savepoints. Examples: 2 d. >>>>>> savepoint.time-retained: The maximum time of savepoints to retain. >>>>>> >>>>>> (3) Procedures (future work) >>>>>> Spark supports SQL extension. After we support Spark CALL statement, we >>>>> can provide procedures to create, delete or rollback to savepoint for >>>>> Spark >>>>> users. >>>>>> >>>>>> Support of CALL is on the road map of Flink. In future version, we can >>>>> also support savepoint-related procedures for Flink users. >>>>>> >>>>>> 4. Expiration of data files >>>>>> >>>>>> Currently, when a snapshot is expired, data files that not be used by >>>>> other snapshots. After we introduce the savepoint, we must make sure the >>>>> data files saved by savepoint will not be deleted. >>>>>> >>>>>> Conversely, when a savepoint is deleted, the data files that not be >>>>> used by existing snapshots and other savepoints will be deleted. >>>>>> >>>>>> I have wrote some POC codes to implement it. I will update the mechanism >>>>> in PIP soon. >>>>>> >>>>>> Best, >>>>>> Yu Zelin >>>>>> >>>>>>> 2023年5月21日 20:54,Jingsong Li <[email protected]> 写道: >>>>>>> >>>>>>> Thanks Yun for your information. >>>>>>> >>>>>>> We need to be careful to avoid confusion between Paimon and Flink >>>>>>> concepts about "savepoint" >>>>>>> >>>>>>> Maybe we don't have to insist on using this "savepoint", for example, >>>>>>> TAG is also a candidate just like Iceberg [1] >>>>>>> >>>>>>> [1] https://iceberg.apache.org/docs/latest/branching/ >>>>>>> >>>>>>> Best, >>>>>>> Jingsong >>>>>>> >>>>>>> On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]> >>>>> wrote: >>>>>>>> >>>>>>>> Thanks Nicholas for your detailed requirements. >>>>>>>> >>>>>>>> We need to supplement user requirements in FLIP, which is mainly aimed >>>>>>>> at two purposes: >>>>>>>> 1. Fault recovery for data errors (named: restore or rollback-to) >>>>>>>> 2. Used to record versions at the day level (such as), targeting >>>>> batch queries >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong >>>>>>>> >>>>>>>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi Guys, >>>>>>>>> >>>>>>>>> Since we use Paimon with Flink in most cases, I think we need to >>>>> identify the same word "savepoint" in different systems. >>>>>>>>> >>>>>>>>> For Flink, savepoint means: >>>>>>>>> >>>>>>>>> 1. Triggered by users, not periodically triggered by the system >>>>> itself. However, this FLIP wants to support it created periodically. >>>>>>>>> 2. Even the so-called incremental native savepoint [1], it will >>>>> not depend on the previous checkpoints or savepoints, it will still copy >>>>> files on DFS to the self-contained savepoint folder. However, from the >>>>> description of this FLIP about the deletion of expired snapshot files, >>>>> paimion savepoint will refer to the previously existing files directly. >>>>>>>>> >>>>>>>>> I don't think we need to make the semantics of Paimon totally the >>>>> same as Flink's. However, we need to introduce a table to tell the >>>>> difference compared with Flink and discuss about the difference. >>>>>>>>> >>>>>>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic >>>>>>>>> >>>>>>>>> Best >>>>>>>>> Yun Tang >>>>>>>>> ________________________________ >>>>>>>>> From: Nicholas Jiang <[email protected]> >>>>>>>>> Sent: Friday, May 19, 2023 17:40 >>>>>>>>> To: [email protected] <[email protected]> >>>>>>>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint >>>>>>>>> >>>>>>>>> Hi Guys, >>>>>>>>> >>>>>>>>> Thanks Zelin for driving the savepoint proposal. I propose some >>>>> opinions for savepont: >>>>>>>>> >>>>>>>>> -- About "introduce savepoint for Paimon to persist full data in a >>>>> time point" >>>>>>>>> >>>>>>>>> The motivation of savepoint proposal is more like snapshot TTL >>>>> management. Actually, disaster recovery is very much mission critical for >>>>> any software. Especially when it comes to data systems, the impact could >>>>> be >>>>> very serious leading to delay in business decisions or even wrong business >>>>> decisions at times. Savepoint is proposed to assist users in recovering >>>>> data from a previous state: "savepoint" and "restore". >>>>>>>>> >>>>>>>>> "savepoint" saves the Paimon table as of the commit time, therefore >>>>> if there is a savepoint, the data generated in the corresponding commit >>>>> could not be clean. Meanwhile, savepoint could let user restore the table >>>>> to this savepoint at a later point in time if need be. On similar lines, >>>>> savepoint cannot be triggered on a commit that is already cleaned up. >>>>> Savepoint is synonymous to taking a backup, just that we don't make a new >>>>> copy of the table, but just save the state of the table elegantly so that >>>>> we can restore it later when in need. >>>>>>>>> >>>>>>>>> "restore" lets you restore your table to one of the savepoint >>>>> commit. Meanwhile, it cannot be undone (or reversed) and so care should be >>>>> taken before doing a restore. At this time, Paimon would delete all data >>>>> files and commit files (timeline files) greater than the savepoint commit >>>>> to which the table is being restored. >>>>>>>>> >>>>>>>>> BTW, it's better to introduce snapshot view based on savepoint, >>>>> which could improve query performance of historical data for Paimon table. >>>>>>>>> >>>>>>>>> -- About Public API of savepont >>>>>>>>> >>>>>>>>> Current introduced savepoint interfaces in Public API are not enough >>>>> for users, for example, deleteSavepoint, restoreSavepoint etc. >>>>>>>>> >>>>>>>>> -- About "Paimon's savepoint need to be combined with Flink's >>>>> savepoint": >>>>>>>>> >>>>>>>>> If paimon supports savepoint mechanism and provides savepoint >>>>> interfaces, the integration with Flink's savepoint is not blocked for this >>>>> proposal. >>>>>>>>> >>>>>>>>> In summary, savepoint is not only used to improve the query >>>>> performance of historical data, but also used for disaster recovery >>>>> processing. >>>>>>>>> >>>>>>>>> On 2023/05/17 09:53:11 Jingsong Li wrote: >>>>>>>>>> What Shammon mentioned is interesting. I agree with what he said >>>>> about >>>>>>>>>> the differences in savepoints between databases and stream >>>>> computing. >>>>>>>>>> >>>>>>>>>> About "Paimon's savepoint need to be combined with Flink's >>>>> savepoint": >>>>>>>>>> >>>>>>>>>> I think it is possible, but we may need to deal with this in another >>>>>>>>>> mechanism, because the snapshots after savepoint may expire. We need >>>>>>>>>> to compare data between two savepoints to generate incremental data >>>>>>>>>> for streaming read. >>>>>>>>>> >>>>>>>>>> But this may not need to block FLIP, it looks like the current >>>>> design >>>>>>>>>> does not break the future combination? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong >>>>>>>>>> >>>>>>>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]> >>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Caizhi, >>>>>>>>>>> >>>>>>>>>>> Thanks for your comments. As you mentioned, I think we may need to >>>>> discuss >>>>>>>>>>> the role of savepoint in Paimon. >>>>>>>>>>> >>>>>>>>>>> If I understand correctly, the main feature of savepoint in the >>>>> current PIP >>>>>>>>>>> is that the savepoint will not be expired, and users can perform a >>>>> query on >>>>>>>>>>> the savepoint according to time-travel. Besides that, there is >>>>> savepoint in >>>>>>>>>>> the database and Flink. >>>>>>>>>>> >>>>>>>>>>> 1. Savepoint in database. The database can roll back table data to >>>>> the >>>>>>>>>>> specified 'version' based on savepoint. So the key point of >>>>> savepoint in >>>>>>>>>>> the database is to rollback data. >>>>>>>>>>> >>>>>>>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a >>>>> specific >>>>>>>>>>> 'path', and save all data of state to the savepoint for job. Then >>>>> users can >>>>>>>>>>> create a new job based on the savepoint to continue consuming >>>>> incremental >>>>>>>>>>> data. I think the core capabilities are: backup for a job, and >>>>> resume a job >>>>>>>>>>> based on the savepoint. >>>>>>>>>>> >>>>>>>>>>> In addition to the above, Paimon may also face data write >>>>> corruption and >>>>>>>>>>> need to recover data based on the specified savepoint. So we may >>>>> need to >>>>>>>>>>> consider what abilities should Paimon savepoint need besides the >>>>> ones >>>>>>>>>>> mentioned in the current PIP? >>>>>>>>>>> >>>>>>>>>>> Additionally, as mentioned above, Flink also has >>>>>>>>>>> savepoint mechanism. During the process of streaming data from >>>>> Flink to >>>>>>>>>>> Paimon, does Paimon's savepoint need to be combined with Flink's >>>>> savepoint? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Shammon FY >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]> >>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi developers! >>>>>>>>>>>> >>>>>>>>>>>> Thanks Zelin for bringing up the discussion. The proposal seems >>>>> good to me >>>>>>>>>>>> overall. However I'd also like to bring up a few options. >>>>>>>>>>>> >>>>>>>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a >>>>> public API, >>>>>>>>>>>> at least for now. What we need to discuss for the public API is >>>>> how the >>>>>>>>>>>> users can create or delete savepoints. For example, what the >>>>> table option >>>>>>>>>>>> looks like, what commands and options are provided for the Flink >>>>> action, >>>>>>>>>>>> etc. >>>>>>>>>>>> >>>>>>>>>>>> 2. Currently most Flink actions are related to streaming >>>>> processing, so >>>>>>>>>>>> only Flink can support them. However, savepoint creation and >>>>> deletion seems >>>>>>>>>>>> like a feature for batch processing. So aside from Flink actions, >>>>> shall we >>>>>>>>>>>> also provide something like Spark actions for savepoints? >>>>>>>>>>>> >>>>>>>>>>>> I would also like to comment on Shammon's views. >>>>>>>>>>>> >>>>>>>>>>>> Should we introduce an option for savepoint path which may be >>>>> different >>>>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I don't see this is necessary. To backup a table the user just >>>>> need to copy >>>>>>>>>>>> all files from the table directory. Savepoint in Paimon, as far >>>>> as I >>>>>>>>>>>> understand, is mainly for users to review historical data, not >>>>> for backing >>>>>>>>>>>> up tables. >>>>>>>>>>>> >>>>>>>>>>>> Will the savepoint copy data files from snapshot or only save >>>>> meta files? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> It would be a heavy burden if a savepoint copies all its files. >>>>> As I >>>>>>>>>>>> mentioned above, savepoint is not for backing up tables. >>>>>>>>>>>> >>>>>>>>>>>> How can users create a new table and restore data from the >>>>> specified >>>>>>>>>>>>> savepoint? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This reminds me of savepoints in Flink. Still, savepoint is not >>>>> for backing >>>>>>>>>>>> up tables so I guess we don't need to support "restoring data" >>>>> from a >>>>>>>>>>>> savepoint. >>>>>>>>>>>> >>>>>>>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks Zelin for initiating this discussion. I have some >>>>> comments: >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Should we introduce an option for savepoint path which may be >>>>>>>>>>>> different >>>>>>>>>>>>> from 'warehouse'? Then users can backup the data of savepoint. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. Will the savepoint copy data files from snapshot or only save >>>>> meta >>>>>>>>>>>>> files? The description in the PIP "After we introduce savepoint, >>>>> we >>>>>>>>>>>> should >>>>>>>>>>>>> also check if the data files are used by savepoints." looks like >>>>> we only >>>>>>>>>>>>> save meta files for savepoint. >>>>>>>>>>>>> >>>>>>>>>>>>> 3. How can users create a new table and restore data from the >>>>> specified >>>>>>>>>>>>> savepoint? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Shammon FY >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li < >>>>> [email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Zelin for driving. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Some comments: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the >>>>> top, >>>>>>>>>>>>>> Public API has no meaning if I don't know how to do it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public >>>>> API, only >>>>>>>>>>>>>> Flink action or configuration option should be public API. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3.Maybe we can have a separate chapter to describe >>>>>>>>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It >>>>> is not >>>>>>>>>>>>>> just an interval, because the true user case is savepoint after >>>>> 0:00. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jingsong >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected] >>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, Paimon Devs, >>>>>>>>>>>>>>> I’d like to start a discussion about PIP-4[1]. In this >>>>> PIP, I >>>>>>>>>>>> want >>>>>>>>>>>>>> to talk about why we need savepoint, and some thoughts about >>>>> managing >>>>>>>>>>>> and >>>>>>>>>>>>>> using savepoint. Look forward to your question and suggestions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Yu Zelin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>> >>
