[
https://issues.apache.org/jira/browse/HUDI-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jian Feng updated HUDI-4677:
----------------------------
Description:
!image-2022-08-22-02-03-31-588.png!
for the snapshot view scenario, Hudi already provides two key features to
support it:
Time travel: user provides a timestamp to query a specific snapshot view of a
Hudi table
Savepoint/restore: "savepoint" saves the table as of the commit time so that it
lets you restore the table to this savepoint at a later point in time if need
be. but in this case, the user usually uses this to prevent cleaning snapshot
view at a specific timestamp, only clean unused files
The situation is there some inconvenience for users if use them directly
Usually users incline to use a meaningful name instead of querying Hudi table
with a timestamp, using the timestamp in SQL may lead to the wrong snapshot
view being used. for example, we can announce that a new tag of hudi table with
table_nameYYYYMMDD was released, then the user can use this new table name to
query.
Savepoint is not designed for this "snapshot view" scenario in the beginning,
it is designed for disaster recovery. let's say a new snapshot view will be
created every day, and it has 7 days retention, we should support lifecycle
management on top of it.
What I plan to do is to let Hudi support release a snapshot view and lifecycle
management out-of-box.
was:
!image-2022-08-22-02-03-31-588.png! image.png
for the snapshot view scenario, Hudi already provides two key features to
support it:
Time travel: user provides a timestamp to query a specific snapshot view of a
Hudi table
Savepoint/restore: "savepoint" saves the table as of the commit time so that it
lets you restore the table to this savepoint at a later point in time if need
be. but in this case, the user usually uses this to prevent cleaning snapshot
view at a specific timestamp, only clean unused files
The situation is there some inconvenience for users if use them directly
Usually users incline to use a meaningful name instead of querying Hudi table
with a timestamp, using the timestamp in SQL may lead to the wrong snapshot
view being used. for example, we can announce that a new tag of hudi table with
table_nameYYYYMMDD was released, then the user can use this new table name to
query.
Savepoint is not designed for this "snapshot view" scenario in the beginning,
it is designed for disaster recovery. let's say a new snapshot view will be
created every day, and it has 7 days retention, we should support lifecycle
management on top of it.
What I plan to do is to let Hudi support release a snapshot view and lifecycle
management out-of-box.
> Snapshot view management
> ------------------------
>
> Key: HUDI-4677
> URL: https://issues.apache.org/jira/browse/HUDI-4677
> Project: Apache Hudi
> Issue Type: Epic
> Reporter: Jian Feng
> Priority: Major
> Attachments: image-2022-08-22-02-03-31-588.png
>
>
> !image-2022-08-22-02-03-31-588.png!
> for the snapshot view scenario, Hudi already provides two key features to
> support it:
> Time travel: user provides a timestamp to query a specific snapshot view of a
> Hudi table
> Savepoint/restore: "savepoint" saves the table as of the commit time so that
> it lets you restore the table to this savepoint at a later point in time if
> need be. but in this case, the user usually uses this to prevent cleaning
> snapshot view at a specific timestamp, only clean unused files
> The situation is there some inconvenience for users if use them directly
> Usually users incline to use a meaningful name instead of querying Hudi table
> with a timestamp, using the timestamp in SQL may lead to the wrong snapshot
> view being used. for example, we can announce that a new tag of hudi table
> with table_nameYYYYMMDD was released, then the user can use this new table
> name to query.
> Savepoint is not designed for this "snapshot view" scenario in the beginning,
> it is designed for disaster recovery. let's say a new snapshot view will be
> created every day, and it has 7 days retention, we should support lifecycle
> management on top of it.
> What I plan to do is to let Hudi support release a snapshot view and
> lifecycle management out-of-box.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)