Thanks, Roman for publishing this design.

There seems to be quite a bit of overlap with FLIP-158 (generalized
incremental checkpoints).

I would go with +1 to the effort if it is a pretty self-contained and
closed effort. Meaning we don't expect that this needs a ton of follow-ups,
other than common maintenance and small bug fixes. If we expect that this
requires a lot of follow-ups, then we end up splitting our work between
this FLIP and FLIP-158, which seems a bit inefficient.
What other committers would be involved to ensure the community can
maintain this?


The design looks fine, in general, with one question:

When persisting changes, you persist all changes that have a newer version
than the latest one confirmed by the JM.

Can you explain why it is like that exactly? Alternatively, you could keep
the latest checkpoint ID for which the state backend persisted the diff
successfully to the checkpoint storage, and created a state handle. For
each checkpoint, the state backend includes the state handles of all
involved chunks. That would be similar to the log-based approach in
FLIP-158.

I have a suspicion that this is because the JM may have released the state
handle (and discarded the diff) for a checkpoint that succeeded on the task
but didn't succeed globally. So we cannot reference any state handle that
has been handed over to the JobManager, but is not yet confirmed.

This characteristic seems to be at the heart of much of the complexity,
also the handling of removed keys seems to be caused by that.
If we could change that assumption, the design would become simpler.

(Side note: I am wondering if this also impacts the FLIP-158 DSTL design.)

Best,
Stephan


On Sun, Nov 15, 2020 at 8:51 AM Khachatryan Roman <
khachatryan.ro...@gmail.com> wrote:

> Hi Stefan,
>
> Thanks for your reply. Very interesting ideas!
> If I understand correctly, SharedStateRegistry will still be responsible
> for pruning the old state; for that, it will maintain some (ordered)
> mapping between StateMaps and their versions, per key group.
> I think one modification to this approach is needed to support journaling:
> for each entry, maintain a version when it was last fully snapshotted; and
> use this version to find the minimum as you described above.
> I'm considering a better state cleanup and optimization of removals as the
> next step. Anyway, I will add it to the FLIP document.
>
> Thanks!
>
> Regards,
> Roman
>
>
> On Tue, Nov 10, 2020 at 12:04 AM Stefan Richter <stefanrichte...@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > Very happy to see that the incremental checkpoint idea is finally
> becoming
> > a reality for the heap backend! Overall the proposal looks pretty good to
> > me. Just wanted to point out one possible improvement from what I can
> still
> > remember from my ideas back then: I think you can avoid doing periodic
> full
> > snapshots for consolidation. Instead, my suggestion would be to track the
> > version numbers you encounter while you iterate a snapshot for writing
> it -
> > and then you should be able to prune all incremental snapshots that were
> > performed with a version number smaller than the minimum you find. To
> avoid
> > the problem of very old entries that never get modified you could start
> > spilling entries with a certain age-difference compared to the current
> map
> > version so that eventually all entries for an old version are re-written
> to
> > newer snapshots. You can track the version up to which this was done in
> the
> > map and then you can again let go of their corresponding snapshots after
> a
> > guaranteed time.So instead of having the burden of periodic large
> > snapshots, you can make every snapshot work a little bit on the cleanup
> and
> > if you are lucky it might happen mostly by itself if most entries are
> > frequently updated. I would also consider to make map clean a special
> event
> > in your log and consider unticking the versions on this event - this
> allows
> > you to let go of old snapshots and saves you from writing a log of
> > antimatter entries. Maybe the ideas are still useful to you.
> >
> > Best,
> > Stefan
> >
> > On 2020/11/04 01:54:25, Khachatryan Roman <k...@gmail.com> wrote:
> > > Hi devs,>
> > >
> > > I'd like to start a discussion of FLIP-151: Incremental snapshots for>
> > > heap-based state backend [1]>
> > >
> > > Heap backend, while being limited state sizes fitting into memory, also
> > has>
> > > some advantages compared to RocksDB backend:>
> > > 1. Serialization once per checkpoint, not per state modification. This>
> > > allows to “squash” updates to the same keys>
> > > 2. Shorter synchronous phase (compared to RocksDB incremental)>
> > > 3. No need for sorting and compaction, no IO amplification and JNI
> > overhead>
> > > This can potentially give higher throughput and efficiency.>
> > >
> > > However, Heap backend currently lacks incremental checkpoints. This
> > FLIP>
> > > aims to add initial support for them.>
> > >
> > > [1]>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-151%3A+Incremental+snapshots+for+heap-based+state+backend
> >
> >
> > >
> > >
> > > Any feedback highly appreciated.>
> > >
> > > Regards,>
> > > Roman>
> > >
>

Reply via email to