Hi team,
[image: image.png]
    for the snapshot view scenario, Hudi already provides two key features
to support it:

   - Time travel: user provides a timestamp to query a specific snapshot
   view of a Hudi table
   - Savepoint/restore: "savepoint" saves the table as of the commit time
   so that it lets you restore the table to this savepoint at a later point in
   time if need be. but in this case, the user usually uses this to prevent
   cleaning snapshot view at a specific timestamp, only clean unused files

The situation is there some inconvenience for users if use them directly

   - Usually users incline to use a meaningful name instead of querying
   Hudi table with a timestamp, using the timestamp in SQL may lead to the
   wrong snapshot view being used. for example, we can announce that a new tag
   of hudi table with table_nameYYYYMMDD was released, then the user can use
   this new table name to query.
   - Savepoint is not designed for this "snapshot view" scenario in the
   beginning, it is designed for disaster recovery. let's say a new snapshot
   view will be created every day, and it has 7 days retention, we should
   support lifecycle management on top of it.

What I plan to do is to let Hudi support release a snapshot view and
lifecycle management out-of-box. We have already done some work when
supporting customers' snapshot view requirements in my company, and hope to
land this feature in Community too.

Please feel free to let me know if you have any idea about this.

Thanks,

Jian Feng

Reply via email to