Hi Till,

thanks for drafting the FLIP, it looks really good. I did a quick pass over
the PR and it seems to be heading in a right direction.

It might be required to introduce a graceful shutdown of the TaskExecutor
> in order to support proper cleanup of resources.
>

This is actively being worked on by Niklas in FLINK-25277 [1].

In the PR, I've seen that you're also replacing directories for storing the
local state with the working directory. Should this be a concern? Is for
example rocksdb able to leverage multiple mount paths for spreading the
load?

I'd also be in favor of introducing a proper (evolving) serialization
format right away instead of the Java serialization, but no hard feelings
if we don't.

[1] https://issues.apache.org/jira/browse/FLINK-25277

Best,
D.

On Wed, Dec 29, 2021 at 4:58 PM Till Rohrmann <trohrm...@apache.org> wrote:

> I've created draft PR for the desired changes [1]. It might be easier to
> take a look at than the branch.
>
> [1] https://github.com/apache/flink/pull/18237
>
> Cheers,
> Till
>
> On Tue, Dec 28, 2021 at 3:22 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Hi everyone,
> >
> > I would like to start a discussion about using the working directory to
> > persist local state for faster recovery (FLIP-201) [1]. Persisting the
> > local state will be beneficial if a crashed process is restarted with the
> > same working directory. In this case, Flink does not have to download the
> > state artifacts again and can recover locally.
> >
> > A POC can be found here [2].
> >
> > Looking forward to your feedback.
> >
> > [1] https://cwiki.apache.org/confluence/x/wJuqCw
> > [2] https://github.com/tillrohrmann/flink/tree/FLIP-201
> >
> > Cheers,
> > Till
> >
>

Reply via email to