Hi Han,

Thanks for driving this!

The FLIP is in good shape, here are my comments:

1. The FLIP introduces the file reusing during snapshot and recovery. Could
you please provide some common use cases from the user's perspective? e.g.
Periodic checkpoint, native savepoint.
2. Does the current design depend on the incremental checkpoint? If we
enforce the full checkpoint, then what happened?
3. Will all the proposed changes be under the ForStStateBackend? It is
better to emphasize this in 'Proposed Changes'
4. Is there any special file handling for checkpoint failure?


Best,
Zakelly


On Fri, Feb 14, 2025 at 6:35 PM Han Yin <alexyin...@gmail.com> wrote:

> Hi everyone,
>
> I would like to open a discussion on implementing faster checkpoint &
> recovery for disaggregated state[1].
>
> This is an improvement work for the disaggregated state management ForSt,
> so you may want to read FLIP-423[2] and FLIP-428[3] to know the backgrounds.
>
> Currently, ForSt copies or fast-duplicates files between the working
> directory and the checkpoint directory during checkpointing and
> restoration. However, in a disaggregated environment, there is no need to
> maintain multiple copies of files since they typically reside within the
> same remote file system. Therefore, we propose an approach for reusing
> files when ForSt generates snapshots or restores from checkpoints and for
> managing the file ownership between Flink & ForSt. By eliminating the
> overhead of file copying, checkpointing & restoration & rescaling can
> become significantly faster for disaggregated state.
>
> Looking forward to your comments or feedback.  Best regards,
> Han Yin
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046898
> >
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855
> >
> [3]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046865
> >
>
>
>
>

Reply via email to