Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

Yu Li Tue, 29 Sep 2020 08:01:22 -0700

Yes let's move on, already cast my vote in the voting thread.

Thanks all for the patience answering / addressing my belated questions!


Best Regards,
Yu


On Sun, 27 Sep 2020 at 20:00, Stephan Ewen <se...@apache.org> wrote:

> Good to see this FLIP moving.
>
> From what I understand, the remaining questions are mainly about how to
> express the roles of the CheckpointStorage and specifically the behavior of
> JMCheckpointStorage and FsCheckpointStorage in the docs.
> This sounds like details we should discuss over the concrete text proposals
> in the PR.
>
> On Sun, Sep 27, 2020 at 5:38 AM Yu Li <car...@gmail.com> wrote:
>
> > Thanks Seth, the updated FLIP overall LGTM, and I've left some inline
> > comments in the doc.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 25 Sep 2020 at 20:58, Seth Wiesman <sjwies...@gmail.com> wrote:
> >
> > > Done
> > >
> > > Seth
> > >
> > > On Fri, Sep 25, 2020 at 2:47 AM Yu Li <car...@gmail.com> wrote:
> > >
> > > > *bq. I think it might help to highlight specific stumbling blocks
> users
> > > > have today and why I believe this change addresses those issues.*
> > > > Thanks for adding more details, I believe adding these blocks to the
> > FLIP
> > > > doc could make the motivation more vivid and convincing.
> > > >
> > > > *bq. To be concrete I think the JavaDoc for setCheckpointStorage
> would
> > be
> > > > something like...*
> > > > I could see this definition extracts the existing description from
> the
> > > > current `StateBackend` interface, it's a valid option, and let me
> quote
> > > it
> > > > again:
> > > > - CheckpointStorage defines how checkpoint snapshots are persisted
> for
> > > > fault tolerance. Various implementations store their checkpoints in
> > > > different fashions and have different requirements and availability
> > > > guarantees.
> > > > - JobManagerCheckpointStorage stores checkpoints in the memory of the
> > > > JobManager. It is lightweight and without additional dependencies but
> > is
> > > > not highly available.
> > > > - FileSystemCheckpointStorage stores checkpoints in a file system.
> For
> > > > systems like HDFS, NFS Drives, S3, and GCS, this storage policy
> > supports
> > > > large state size, in the magnitude of many terabytes while providing
> a
> > > > highly available foundation for stateful applications. This
> checkpoint
> > > > storage policy is recommended for most production deployments.
> > > >
> > > > Sticking to this definition, I think we should have the below
> migration
> > > > plans for existing backends:
> > > > - `MemoryStateBackend(null, String savepointPath)` to
> > > > `HashMapStateBackend() + JobManagerCheckpointStorage()`
> > > > - `MemoryStateBackend(<non-null-checkpoint-path>, String
> > savepointPath)`
> > > to
> > > > `HashMapStateBackend() + FileSystemCheckpointStorage()`
> > > > in addition of the existing:
> > > > - `MemoryStateBackend()` to `HashMapStateBackend() +
> > > > JobManagerCheckpointStorage()`
> > > > and could be summarized as:
> > > > - MemoryStateBackend with checkpoint path: `HashMapStateBackend() +
> > > > FileSystemCheckpointStorage()`
> > > > - MemoryStateBackend w/o checkpoint path: `HashMapStateBackend() +
> > > > JobManagerCheckpointStorage()`
> > > >
> > > > And I believe adding the above highlighted blocks to the FLIP doc
> (the
> > > "New
> > > > StateBackend User API" and "Migration Plan" sections, separately)
> could
> > > > make it more complete.
> > > >
> > > > PS. Please note that although the current javadoc of `StateBackend`
> > > states
> > > > "MemoryStateBackend is not highly available", it actually supports
> > > > persisting the checkpoint data to DFS when checkpoint path is given,
> so
> > > the
> > > > mapping between old and new APIs are not that straight-forward and
> need
> > > > some clear clarifications, from my point of view.
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 25 Sep 2020 at 08:33, Seth Wiesman <sjwies...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Yu,
> > > > >
> > > > > bq* I thought the FLIP aims at resolving some *existing* confusion,
> > > i.e.
> > > > > the durability mystery to users.
> > > > >
> > > > > I think it might help to highlight specific stumbling blocks users
> > have
> > > > > today and why I believe this change addresses those issues. Some
> > > frequent
> > > > > things I've heard over the past several years include:
> > > > >
> > > > > 1) "We use RocksDB because we don't need fault tolerance."
> > > > > 2) "We don't use RocksDB because we don't want to manage an
> external
> > > > > database."
> > > > > 3) Believing RocksDB is reading and writing directly with S3 or
> HDFS
> > > (vs.
> > > > > local disk)
> > > > > 4) Believing FsStateBackend spills to disk or has anything to do
> with
> > > the
> > > > > local filesystem
> > > > > 5) Pointing RocksDB at network-attached storage, believing that the
> > > state
> > > > > backend needs to be fault-tolerant
> > > > >
> > > > > This question from the ml is very representative of where users are
> > > > > struggling[1]. Many of these questions were not from new users but
> > from
> > > > > organizations that were in production! Just yesterday I was on the
> > > phone
> > > > > with a company that didn't realize they were in production without
> > > > > checkpointing; honestly, you would be shocked how often this
> happens.
> > > The
> > > > > current state backend abstraction is to complex for many of our
> > users.
> > > > What
> > > > > all these questions have in common is misunderstanding the
> > relationship
> > > > > between how data is stored locally on TMs vs how checkpoints make
> > that
> > > > > state durable.
> > > > >
> > > > > The FLIP aims actively help users by allowing them to reason about
> > > state
> > > > > backends separately from checkpoint durability. In the future, a
> > state
> > > > > backend only defines where and how state is stored locally on the
> TM
> > > > while
> > > > > checkpoint storage defines where and how checkpoints are stored for
> > > > > recovery. To be concrete I think the JavaDoc for
> setCheckpointStorage
> > > > would
> > > > > be something like:
> > > > >
> > > > > ```java
> > > > > /**
> > > > >  * CheckpointStorage defines how checkpoint snapshots are persisted
> > for
> > > > > fault tolerance
> > > > > *. Various implementations  store their checkpoints in different
> > > fashions
> > > > > and have different requirements and
> > > > >  * availability guarantees.
> > > > >  *
> > > > >  *<p>For example, JobManagerCheckpointStorage stores checkpoints in
> > the
> > > > > memory of the JobManager.
> > > > >  * It is lightweight and without additional dependencies but is not
> > > > highly
> > > > > available
> > > > >  * and only supports small state sizes. This checkpoint storage
> > policy
> > > is
> > > > > convenient for
> > > > >  * local testing and development.
> > > > >  *
> > > > >  *<p>FileSystemCheckpointStorage stores checkpoints in a
> filesystem.
> > > For
> > > > > systems like
> > > > >  * HDFS, NFS Drives, S3, and GCS, this storage policy supports
> large
> > > > state
> > > > > size,
> > > > >  * in the magnitude of many terabytes while providing a highly
> > > available
> > > > > foundation
> > > > >  * for stateful applications. This checkpoint storage policy is
> > > > recommended
> > > > > for most
> > > > >  * production deployments.
> > > > >  */
> > > > > void setCheckpointStorage(CheckpointStorage storage) {}
> > > > > ```
> > > > >
> > > > > Seth
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/State-Storage-Questions-td37919.html
> > > > > [2] Also naming, but we're aligned here
> > > > >
> > > > > On Thu, Sep 24, 2020 at 10:24 AM Yu Li <car...@gmail.com> wrote:
> > > > >
> > > > > > And to make it clear, I'm +1 on the idea of decoupling state
> > backends
> > > > > with
> > > > > > checkpointing. I don't have any question about making it clear
> that
> > > > > > heap/RocksDB is where we serve the routine state read/write and
> > where
> > > > to
> > > > > > put the checkpoint data is another story. My only concern lies in
> > the
> > > > > newly
> > > > > > introduced setCheckpointStorage API and how we define its
> > semantics,
> > > > and
> > > > > > not sure whether it's due to my ignorance.
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Thu, 24 Sep 2020 at 23:11, Yu Li <car...@gmail.com> wrote:
> > > > > >
> > > > > > > *bq. What new confusion would be introduced here?*
> > > > > > > No *new* confusion introduced, but as mentioned at the very
> > > beginning
> > > > > of
> > > > > > > the motivation ("Apache Flink's durability story is a mystery
> to
> > > many
> > > > > > > users"), I thought the FLIP aims at resolving some *existing*
> > > > > > > confusions, i.e. the durability mystery to users.
> > > > > > >
> > > > > > > For me, I'm not 100% clear about how to write the javadoc of
> the
> > > > > > > setCheckpointStorage API. Would it be like "specify where the
> > > > > checkpoint
> > > > > > > data is stored"? If so, do we need to explain the fact that
> when
> > a
> > > > > > > checkpoint path is given, JM will also persist the checkpoint
> > data
> > > to
> > > > > > DFS?
> > > > > > > It's true that such confusion also exists today, but would the
> > > > > > introduction
> > > > > > > of the new API expose it further?
> > > > > > >
> > > > > > > IMHO we need to document the newly introduced API / classes and
> > > their
> > > > > > > semantics clearly in the FLIP to make sure everyone is on the
> > same
> > > > > page,
> > > > > > > but if we feel such work / discussions are all details and only
> > > need
> > > > to
> > > > > > > happen at the documenting and release note phase, it's also
> fine
> > to
> > > > me.
> > > > > > >
> > > > > > > And if I'm the only one who has such questions / concerns on
> the
> > > new
> > > > > > > `setCheckpointStorage` API and most of others feel its semantic
> > is
> > > > > sound
> > > > > > > and clear, then please just ignore me and move on.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Yu
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 23 Sep 2020 at 17:08, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > >> I am confused now with the concerns here. This is very much
> from
> > > the
> > > > > > user
> > > > > > >> perspective (which is partially also the developer perspective
> > > which
> > > > > is
> > > > > > >> the
> > > > > > >> sign of an intuitive abstraction).
> > > > > > >>
> > > > > > >> Of course, there will be docs describing what
> > JMCheckpointStorage
> > > > and
> > > > > > >> FsCheckpointStorage are.
> > > > > > >> And having release notes that describe that
> > > > > > >> RocksDBStateBackend("s3://...")
> > > > > > >> now corresponds to a combination of "RocksDBBackend" and
> > > > > > >> "FsCheckpointStorage" is also straightforward.
> > > > > > >>
> > > > > > >> We said to keep the old RocksDBStateBackend class and let it
> > > > implement
> > > > > > >> both
> > > > > > >> interfaces such that the old code still works exactly as
> before.
> > > > > > >>
> > > > > > >> What new confusion would be introduced here?
> > > > > > >> Understanding the difference between JMCheckpointStorage and
> > > > > > >> FsCheckpointStorage was always necessary when one needed to
> > > > understand
> > > > > > the
> > > > > > >> difference between MemoryStateBackend and FsStateBackend. It
> > > should
> > > > be
> > > > > > >> easier to define this after this change, because it is the
> only
> > > > thing
> > > > > > that
> > > > > > >> we describe when explaining what checkpoint storage to use
> > (rather
> > > > > than
> > > > > > >> also having the choice of index structure coupled to that).
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Sep 23, 2020 at 10:39 AM Aljoscha Krettek <
> > > > > aljos...@apache.org>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > On 23.09.20 04:40, Yu Li wrote:
> > > > > > >> > > To be specific, with the old API users don't need to set
> > > > > checkpoint
> > > > > > >> > > storage, instead they only need to pass the checkpoint
> path
> > > w/o
> > > > > > caring
> > > > > > >> > > about the storage. The new APIs are forcing users to set
> the
> > > > > storage
> > > > > > >> so
> > > > > > >> > > they have to know the difference between different
> storages.
> > > > It's
> > > > > > not
> > > > > > >> an
> > > > > > >> > > implementation change, but an API change that users have
> to
> > > > > > understand
> > > > > > >> > and
> > > > > > >> > > follow-up.
> > > > > > >> >
> > > > > > >> > I think the main point of the FLIP is to make it more
> obvious
> > to
> > > > > users
> > > > > > >> > what is happening.
> > > > > > >> >
> > > > > > >> > With current Flink, they would do a `setStateBackend(new
> > > > > > >> > FsStateBackend(<path>))`. What the user is actually "saying"
> > > with
> > > > > this
> > > > > > >> > is: I want to keep state on heap but store checkpoints in
> DFS.
> > > > They
> > > > > > are
> > > > > > >> > not actually changing the "State Backend", the thing that
> > keeps
> > > > > state
> > > > > > in
> > > > > > >> > operators, but only where state is checkpointed. The thing
> > that
> > > is
> > > > > > used
> > > > > > >> > for local state storage in operators is still the "Heap
> > > Backend".
> > > > > > >> >
> > > > > > >> > With the proposed FLIP, a user would do a
> > > > `setCheckpointStorage(new
> > > > > > >> > FsStorage(<path>))`. Which makes it obvious that they're
> > > changing
> > > > > > where
> > > > > > >> > checkpoints are stored but not the actual "State Backend",
> > which
> > > > is
> > > > > > >> > still "Heap Backend" (the default).
> > > > > > >> >
> > > > > > >> > I do understand Yu's point, though, that this will be
> > confusing
> > > > for
> > > > > > >> > current Flink users. They are used to setting a "State
> > Backend"
> > > > > > if/when
> > > > > > >> > they want to change the storage location. To fit the new
> model
> > > > they
> > > > > > >> > would have to change the call from `setStateBackend()` to
> > > > > > >> > `setCheckpointStorage()`.
> > > > > > >> >
> > > > > > >> > I think we need to life with this short-term confusion
> because
> > > in
> > > > > the
> > > > > > >> > long run the proposed split between checkpoint location and
> > > state
> > > > > > >> > backend makes sense and will be more straightforward for
> users
> > > to
> > > > > > >> > understand.
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Aljoscha
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

Reply via email to