Re: [DISCUSS] PIP-4 Support savepoint

yu zelin Mon, 22 May 2023 01:48:12 -0700

Hi, all,

Thank all of you for your suggestions and questions. After reading your 
suggestions, I adopt some of them and I want to share my opinions here.


To make my statements more clear, I will still use the word `savepoint`. When 
we make a consensus, the name may be changed.

1. The purposes of savepoint

As Shammon mentioned, Flink and database also have the concept of `savepoint`. 
So it’s better to clarify the purposes of our savepoint. Thanks for Nicholas 
and Jingsong, I think your explanations are very clear. I’d like to give my 
summary:

(1) Fault recovery (or we can say disaster recovery). Users can ROLL BACK to a 
savepoint if needed. If user rollbacks to a savepoint, the table will hold the 
data in the savepoint and the data committed  after the savepoint will be 
deleted. In this scenario we need savepoint because snapshots may have expired, 
the savepoint can keep longer and save user’s old data.

(2) Record versions of data at a longer interval (typically daily level or 
weekly level). With savepoint, user can query the old data in batch mode. 
Comparing to copy records to a new table or merge incremental records with old 
records (like using merge into in Hive), the savepoint is more lightweight 
because we don’t copy data files, we just record the meta data of them.

As you can see, savepoint is very similar to snapshot. The differences are:

 (1) Savepoint lives longer. In most cases, snapshot’s life time is about 
several minutes to hours. We suppose the savepoint can live several days, 
weeks, or even months.

(2) Savepoint is mainly used for batch reading for historical data. In this 
PIP, we don’t introduce streaming reading for savepoint.

2. Candidates of name

I agree with Jingsong that we can use a new name. Since the purpose and 
mechanism (savepoint is very similar to snapshot) of savepoint is similar to 
`tag` in iceberg, maybe we can use `tag`.

In my opinion, an alternative is `anchor`. All the snapshots are like the 
navigation path of the streaming data, and an `anchor` can stop it in a place.  

3. Public table operations and options

We supposed to expose some operations and table options for user to manage the 
savepoint.

(1) Operations (Currently for Flink)
We provide flink actions to manage savepoints:
    create-savepoint: To generate a savepoint from latest snapshot. Support to 
create from specified snapshot.
    delete-savepoint: To delete specified savepoint.
    rollback-to: To roll back to a specified savepoint.

(2) Table options
We suppose to provide options for creating savepoint periodically:
    savepoint.create-time: When to create the savepoint. Example: 00:00
    savepoint.create-interval: Interval between the creation of two savepoints. 
Examples: 2 d.
    savepoint.time-retained: The maximum time of savepoints to retain.  

(3) Procedures (future work)
Spark supports SQL extension. After we support Spark CALL statement, we can 
provide procedures to create, delete or rollback to savepoint for Spark users.

Support of CALL is on the road map of Flink. In future version, we can also 
support savepoint-related procedures for Flink users.

4. Expiration of data files

Currently, when a snapshot is expired, data files that not be used by other 
snapshots. After we introduce the savepoint, we must make sure the data files 
saved by savepoint will not be deleted. 

Conversely,  when a savepoint is deleted, the data files that not be used by 
existing snapshots and other savepoints will be deleted.

I have wrote some POC codes to implement it. I will update the mechanism in PIP 
soon.

Best,
Yu Zelin

> 2023年5月21日 20:54，Jingsong Li <[email protected]> 写道：
> 
> Thanks Yun for your information.
> 
> We need to be careful to avoid confusion between Paimon and Flink
> concepts about "savepoint"
> 
> Maybe we don't have to insist on using this "savepoint", for example,
> TAG is also a candidate just like Iceberg [1]
> 
> [1] https://iceberg.apache.org/docs/latest/branching/
> 
> Best,
> Jingsong
> 
> On Sun, May 21, 2023 at 8:51 PM Jingsong Li <[email protected]> wrote:
>> 
>> Thanks Nicholas for your detailed requirements.
>> 
>> We need to supplement user requirements in FLIP, which is mainly aimed
>> at two purposes:
>> 1. Fault recovery for data errors (named: restore or rollback-to)
>> 2. Used to record versions at the day level (such as), targeting batch 
>> queries
>> 
>> Best,
>> Jingsong
>> 
>> On Sat, May 20, 2023 at 2:55 PM Yun Tang <[email protected]> wrote:
>>> 
>>> Hi Guys,
>>> 
>>> Since we use Paimon with Flink in most cases, I think we need to identify 
>>> the same word "savepoint" in different systems.
>>> 
>>> For Flink, savepoint means:
>>> 
>>>  1.  Triggered by users, not periodically triggered by the system itself. 
>>> However, this FLIP wants to support it created periodically.
>>>  2.  Even the so-called incremental native savepoint [1], it will not 
>>> depend on the previous checkpoints or savepoints, it will still copy files 
>>> on DFS to the self-contained savepoint folder. However, from the 
>>> description of this FLIP about the deletion of expired snapshot files, 
>>> paimion savepoint will refer to the previously existing files directly.
>>> 
>>> I don't think we need to make the semantics of Paimon totally the same as 
>>> Flink's. However, we need to introduce a table to tell the difference 
>>> compared with Flink and discuss about the difference.
>>> 
>>> [1] 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Semantic
>>> 
>>> Best
>>> Yun Tang
>>> ________________________________
>>> From: Nicholas Jiang <[email protected]>
>>> Sent: Friday, May 19, 2023 17:40
>>> To: [email protected] <[email protected]>
>>> Subject: Re: [DISCUSS] PIP-4 Support savepoint
>>> 
>>> Hi Guys,
>>> 
>>> Thanks Zelin for driving the savepoint proposal. I propose some opinions 
>>> for savepont:
>>> 
>>> -- About "introduce savepoint for Paimon to persist full data in a time 
>>> point"
>>> 
>>> The motivation of savepoint proposal is more like snapshot TTL management. 
>>> Actually, disaster recovery is very much mission critical for any software. 
>>> Especially when it comes to data systems, the impact could be very serious 
>>> leading to delay in business decisions or even wrong business decisions at 
>>> times. Savepoint is proposed to assist users in recovering data from a 
>>> previous state: "savepoint" and "restore".
>>> 
>>> "savepoint" saves the Paimon table as of the commit time, therefore if 
>>> there is a savepoint, the data generated in the corresponding commit could 
>>> not be clean. Meanwhile, savepoint could let user restore the table to this 
>>> savepoint at a later point in time if need be. On similar lines, savepoint 
>>> cannot be triggered on a commit that is already cleaned up. Savepoint is 
>>> synonymous to taking a backup, just that we don't make a new copy of the 
>>> table, but just save the state of the table elegantly so that we can 
>>> restore it later when in need.
>>> 
>>> "restore" lets you restore your table to one of the savepoint commit. 
>>> Meanwhile, it cannot be undone (or reversed) and so care should be taken 
>>> before doing a restore. At this time, Paimon would delete all data files 
>>> and commit files (timeline files) greater than the savepoint commit to 
>>> which the table is being restored.
>>> 
>>> BTW, it's better to introduce snapshot view based on savepoint, which could 
>>> improve query performance of historical data for Paimon table.
>>> 
>>> -- About Public API of savepont
>>> 
>>> Current introduced savepoint interfaces in Public API are not enough for 
>>> users, for example, deleteSavepoint, restoreSavepoint etc.
>>> 
>>> -- About "Paimon's savepoint need to be combined with Flink's savepoint":
>>> 
>>> If paimon supports savepoint mechanism and provides savepoint interfaces, 
>>> the integration with Flink's savepoint is not blocked for this proposal.
>>> 
>>> In summary, savepoint is not only used to improve the query performance of 
>>> historical data, but also used for disaster recovery processing.
>>> 
>>> On 2023/05/17 09:53:11 Jingsong Li wrote:
>>>> What Shammon mentioned is interesting. I agree with what he said about
>>>> the differences in savepoints between databases and stream computing.
>>>> 
>>>> About "Paimon's savepoint need to be combined with Flink's savepoint":
>>>> 
>>>> I think it is possible, but we may need to deal with this in another
>>>> mechanism, because the snapshots after savepoint may expire. We need
>>>> to compare data between two savepoints to generate incremental data
>>>> for streaming read.
>>>> 
>>>> But this may not need to block FLIP, it looks like the current design
>>>> does not break the future combination?
>>>> 
>>>> Best,
>>>> Jingsong
>>>> 
>>>> On Wed, May 17, 2023 at 5:33 PM Shammon FY <[email protected]> wrote:
>>>>> 
>>>>> Hi Caizhi,
>>>>> 
>>>>> Thanks for your comments. As you mentioned, I think we may need to discuss
>>>>> the role of savepoint in Paimon.
>>>>> 
>>>>> If I understand correctly, the main feature of savepoint in the current 
>>>>> PIP
>>>>> is that the savepoint will not be expired, and users can perform a query 
>>>>> on
>>>>> the savepoint according to time-travel. Besides that, there is savepoint 
>>>>> in
>>>>> the database and Flink.
>>>>> 
>>>>> 1. Savepoint in database. The database can roll back table data to the
>>>>> specified 'version' based on savepoint. So the key point of savepoint in
>>>>> the database is to rollback data.
>>>>> 
>>>>> 2. Savepoint in Flink. Users can trigger a savepoint with a specific
>>>>> 'path', and save all data of state to the savepoint for job. Then users 
>>>>> can
>>>>> create a new job based on the savepoint to continue consuming incremental
>>>>> data. I think the core capabilities are: backup for a job, and resume a 
>>>>> job
>>>>> based on the savepoint.
>>>>> 
>>>>> In addition to the above, Paimon may also face data write corruption and
>>>>> need to recover data based on the specified savepoint. So we may need to
>>>>> consider what abilities should Paimon savepoint need besides the ones
>>>>> mentioned in the current PIP?
>>>>> 
>>>>> Additionally, as mentioned above, Flink also has
>>>>> savepoint mechanism. During the process of streaming data from Flink to
>>>>> Paimon, does Paimon's savepoint need to be combined with Flink's 
>>>>> savepoint?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Shammon FY
>>>>> 
>>>>> 
>>>>> On Wed, May 17, 2023 at 4:02 PM Caizhi Weng <[email protected]> wrote:
>>>>> 
>>>>>> Hi developers!
>>>>>> 
>>>>>> Thanks Zelin for bringing up the discussion. The proposal seems good to 
>>>>>> me
>>>>>> overall. However I'd also like to bring up a few options.
>>>>>> 
>>>>>> 1. As Jingsong mentioned, Savepoint class should not become a public API,
>>>>>> at least for now. What we need to discuss for the public API is how the
>>>>>> users can create or delete savepoints. For example, what the table option
>>>>>> looks like, what commands and options are provided for the Flink action,
>>>>>> etc.
>>>>>> 
>>>>>> 2. Currently most Flink actions are related to streaming processing, so
>>>>>> only Flink can support them. However, savepoint creation and deletion 
>>>>>> seems
>>>>>> like a feature for batch processing. So aside from Flink actions, shall 
>>>>>> we
>>>>>> also provide something like Spark actions for savepoints?
>>>>>> 
>>>>>> I would also like to comment on Shammon's views.
>>>>>> 
>>>>>> Should we introduce an option for savepoint path which may be different
>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
>>>>>>> 
>>>>>> 
>>>>>> I don't see this is necessary. To backup a table the user just need to 
>>>>>> copy
>>>>>> all files from the table directory. Savepoint in Paimon, as far as I
>>>>>> understand, is mainly for users to review historical data, not for 
>>>>>> backing
>>>>>> up tables.
>>>>>> 
>>>>>> Will the savepoint copy data files from snapshot or only save meta files?
>>>>>>> 
>>>>>> 
>>>>>> It would be a heavy burden if a savepoint copies all its files. As I
>>>>>> mentioned above, savepoint is not for backing up tables.
>>>>>> 
>>>>>> How can users create a new table and restore data from the specified
>>>>>>> savepoint?
>>>>>> 
>>>>>> 
>>>>>> This reminds me of savepoints in Flink. Still, savepoint is not for 
>>>>>> backing
>>>>>> up tables so I guess we don't need to support "restoring data" from a
>>>>>> savepoint.
>>>>>> 
>>>>>> Shammon FY <[email protected]> 于2023年5月17日周三 10:32写道：
>>>>>> 
>>>>>>> Thanks Zelin for initiating this discussion. I have some comments:
>>>>>>> 
>>>>>>> 1. Should we introduce an option for savepoint path which may be
>>>>>> different
>>>>>>> from 'warehouse'? Then users can backup the data of savepoint.
>>>>>>> 
>>>>>>> 2. Will the savepoint copy data files from snapshot or only save meta
>>>>>>> files? The description in the PIP "After we introduce savepoint, we
>>>>>> should
>>>>>>> also check if the data files are used by savepoints." looks like we only
>>>>>>> save meta files for savepoint.
>>>>>>> 
>>>>>>> 3. How can users create a new table and restore data from the specified
>>>>>>> savepoint?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Shammon FY
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, May 17, 2023 at 10:19 AM Jingsong Li <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Zelin for driving.
>>>>>>>> 
>>>>>>>> Some comments:
>>>>>>>> 
>>>>>>>> 1. I think it's possible to advance `Proposed Changes` to the top,
>>>>>>>> Public API has no meaning if I don't know how to do it.
>>>>>>>> 
>>>>>>>> 2. Public API, Savepoint and SavepointManager are not Public API, only
>>>>>>>> Flink action or configuration option should be public API.
>>>>>>>> 
>>>>>>>> 3.Maybe we can have a separate chapter to describe
>>>>>>>> `savepoint.create-interval`, maybe 'Periodically savepoint'? It is not
>>>>>>>> just an interval, because the true user case is savepoint after 0:00.
>>>>>>>> 
>>>>>>>> 4.About 'Interaction with Snapshot', to be continued ...
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Jingsong
>>>>>>>> 
>>>>>>>> On Tue, May 16, 2023 at 7:07 PM yu zelin <[email protected]>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi, Paimon Devs,
>>>>>>>>>     I’d like to start a discussion about PIP-4[1]. In this PIP, I
>>>>>> want
>>>>>>>> to talk about why we need savepoint, and some thoughts about managing
>>>>>> and
>>>>>>>> using savepoint. Look forward to your question and suggestions.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Yu Zelin
>>>>>>>>> 
>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/NxE0Dw
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>

Re: [DISCUSS] PIP-4 Support savepoint

Reply via email to