Here is an example for you:

Parallel streaming kmeans, the state we keep is the current cluster
centers, and we use iterations to sync the centers across parallel
instances.
We can afford lost model updated in the loop but we need the checkpoint the
models.

https://github.com/gyfora/stream-clustering/blob/master/src/main/scala/stream/clustering/StreamClustering.scala

(checkpointing is not turned on but you will get the point)



Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2015. jún. 10., Sze,
12:47):

> You are right, to have consistent results we would need to persist the
> records.
>
> But since we cannot do that right now, we can still checkpoint all
> operator states and understand that inflight records in the loop are lost
> on failure.
>
> This is acceptable for most the use-cases that we have developed so far
> for iterations (machine learning, graph updates, etc.) What is not
> acceptable is to not have checkpointing at all.
>
> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún. 10.,
> Sze, 12:43):
>
>> The elements that are in-flight in an iteration are also state of the
>> job. I'm wondering whether the state inside iterations still makes
>> sense without these in-flight elements. But I also don't know the King
>> use-case, that's why I though an example could be helpful.
>>
>> On Wed, Jun 10, 2015 at 12:37 PM, Gyula Fóra <gyula.f...@gmail.com>
>> wrote:
>> > I don't understand the question, I vote for checkpointing all state in
>> the
>> > job, even inside iterations (its more of a loop).
>> >
>> > Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2015. jún.
>> 10.,
>> > Sze, 12:34):
>> >
>> >> I don't understand why having the state inside an iteration but not
>> >> the elements that correspond to this state or created this state is
>> >> desirable. Maybe an example could help understand this better?
>> >>
>> >> On Wed, Jun 10, 2015 at 11:27 AM, Gyula Fóra <gyula.f...@gmail.com>
>> wrote:
>> >> > The other tests verify that the checkpointing algorithm runs
>> properly.
>> >> That
>> >> > also ensures that it runs for iterations because a loop is just an
>> extra
>> >> > source and sink in the jobgraph (so it is the same for the
>> algorithm).
>> >> >
>> >> > Fabian Hueske <fhue...@gmail.com> ezt írta (időpont: 2015. jún. 10.,
>> >> Sze,
>> >> > 11:19):
>> >> >
>> >> >> Without going into the details, how well tested is this feature?
>> The PR
>> >> >> only extends one test by a few lines.
>> >> >>
>> >> >> Is that really enough to ensure that
>> >> >> 1) the change does not cause trouble
>> >> >> 2) is working as expected
>> >> >>
>> >> >> If this feature should go into the release, it must be thoroughly
>> >> checked
>> >> >> and we must take the time for that.
>> >> >> Including code and hoping for the best because time is scarce is
>> not an
>> >> >> option IMO.
>> >> >>
>> >> >> Fabian
>> >> >>
>> >> >>
>> >> >> 2015-06-10 11:05 GMT+02:00 Gyula Fóra <gyula.f...@gmail.com>:
>> >> >>
>> >> >> > And also I would like to remind everyone that any fault tolerance
>> we
>> >> >> > provide is only as good as the fault tolerance of the master node.
>> >> Which
>> >> >> is
>> >> >> > non existent at the moment.
>> >> >> >
>> >> >> > So I don't see a reason why a user should not be able to choose
>> >> whether
>> >> >> he
>> >> >> > wants state checkpoints for iterations as well.
>> >> >> >
>> >> >> > In any case this will be used by King for instance, so making it
>> part
>> >> of
>> >> >> > the release would save a lot of work for everyone.
>> >> >> >
>> >> >> > Paris Carbone <par...@kth.se> ezt írta (időpont: 2015. jún. 10.,
>> Sze,
>> >> >> > 10:29):
>> >> >> >
>> >> >> > >
>> >> >> > > To continue Gyula's point, for consistent snapshots we need to
>> >> persist
>> >> >> > the
>> >> >> > > records in transit within the loop  and also slightly change the
>> >> >> current
>> >> >> > > protocol since it works only for DAGs. Before going into that
>> >> direction
>> >> >> > > though I would propose we first see whether there is a nice way
>> to
>> >> make
>> >> >> > > iterations more structured.
>> >> >> > >
>> >> >> > > Paris
>> >> >> > > ________________________________________
>> >> >> > > From: Gyula Fóra <gyula.f...@gmail.com>
>> >> >> > > Sent: Wednesday, June 10, 2015 10:19 AM
>> >> >> > > To: dev@flink.apache.org
>> >> >> > > Subject: Re: Force enabling checkpoints for iterative streaming
>> jobs
>> >> >> > >
>> >> >> > > I disagree. Not having checkpointed operators inside the
>> iteration
>> >> >> still
>> >> >> > > breaks the guarantees.
>> >> >> > >
>> >> >> > > It is not about the states it is about the loop itself.
>> >> >> > > On Wed, Jun 10, 2015 at 10:12 AM Aljoscha Krettek <
>> >> aljos...@apache.org
>> >> >> >
>> >> >> > > wrote:
>> >> >> > >
>> >> >> > > > This is the answer I gave on the PR (we should have one place
>> for
>> >> >> > > > discussing this, though):
>> >> >> > > >
>> >> >> > > > I would be against merging this in the current form. What I
>> >> propose
>> >> >> is
>> >> >> > > > to analyse the topology to verify that there are no
>> checkpointed
>> >> >> > > > operators inside iterations. Operators before and after
>> iterations
>> >> >> can
>> >> >> > > > be checkpointed and we can safely allow the user to enable
>> >> >> > > > checkpointing.
>> >> >> > > >
>> >> >> > > > If we have the code to analyse which operators are inside
>> >> iterations
>> >> >> > > > we could also disallow windows inside iterations. I think
>> windows
>> >> >> > > > inside iterations don't make sense since elements in different
>> >> >> > > > "iterations" would end up in the same window. Maybe I'm wrong
>> here
>> >> >> > > > though, then please correct me.
>> >> >> > > >
>> >> >> > > > On Wed, Jun 10, 2015 at 10:08 AM, Márton Balassi
>> >> >> > > > <balassi.mar...@gmail.com> wrote:
>> >> >> > > > > I agree that for the sake of the above mentioned use cases
>> it is
>> >> >> > > > reasonable
>> >> >> > > > > to add this to the release with the right documentation, for
>> >> >> machine
>> >> >> > > > > learning potentially loosing one round of feedback data
>> should
>> >> not
>> >> >> > > > matter.
>> >> >> > > > >
>> >> >> > > > > Let us not block prominent users until the next release on
>> this.
>> >> >> > > > >
>> >> >> > > > > On Wed, Jun 10, 2015 at 8:09 AM, Gyula Fóra <
>> >> gyula.f...@gmail.com>
>> >> >> > > > wrote:
>> >> >> > > > >
>> >> >> > > > >> As for people currently suffering from it:
>> >> >> > > > >>
>> >> >> > > > >> An application King is developing requires iterations, and
>> they
>> >> >> need
>> >> >> > > > >> checkpoints. Practically all SAMOA programs would need
>> this.
>> >> >> > > > >>
>> >> >> > > > >> It is very likely that the state interfaces will be changed
>> >> after
>> >> >> > the
>> >> >> > > > >> release, so this is not something that we can just add
>> later. I
>> >> >> > don't
>> >> >> > > > see a
>> >> >> > > > >> reason why we should not add it, as it is clearly
>> documented.
>> >> In
>> >> >> > this
>> >> >> > > > >> actual case not having guarantees at all means people will
>> >> never
>> >> >> use
>> >> >> > > it
>> >> >> > > > in
>> >> >> > > > >> any production system. Having limited guarantees means
>> that it
>> >> >> will
>> >> >> > > > depend
>> >> >> > > > >> on the application.
>> >> >> > > > >>
>> >> >> > > > >> On Wed, Jun 10, 2015 at 12:53 AM, Ufuk Celebi <
>> u...@apache.org>
>> >> >> > wrote:
>> >> >> > > > >>
>> >> >> > > > >> > Hey Gyula,
>> >> >> > > > >> >
>> >> >> > > > >> > I understand your reasoning, but I don't think its worth
>> to
>> >> rush
>> >> >> > > this
>> >> >> > > > >> into
>> >> >> > > > >> > the release.
>> >> >> > > > >> >
>> >> >> > > > >> > As you've said, we cannot give precise guarantees. But
>> this
>> >> is
>> >> >> > > > arguably
>> >> >> > > > >> > one of the key requirements for any fault tolerance
>> >> mechanism.
>> >> >> > > > Therefore
>> >> >> > > > >> I
>> >> >> > > > >> > disagree that this is better than not having anything at
>> >> all. I
>> >> >> > > think
>> >> >> > > > it
>> >> >> > > > >> > will already go a long way to have the non-iterative case
>> >> >> working
>> >> >> > > > >> reliably.
>> >> >> > > > >> >
>> >> >> > > > >> > And as far as I know there are no users really suffering
>> from
>> >> >> this
>> >> >> > > at
>> >> >> > > > the
>> >> >> > > > >> > moment (in the sense that someone has complained on the
>> >> mailing
>> >> >> > > list).
>> >> >> > > > >> >
>> >> >> > > > >> > Hence, I vote to postpone this.
>> >> >> > > > >> >
>> >> >> > > > >> > – Ufuk
>> >> >> > > > >> >
>> >> >> > > > >> > On 10 Jun 2015, at 00:19, Gyula Fóra <gyf...@apache.org>
>> >> wrote:
>> >> >> > > > >> >
>> >> >> > > > >> > > Hey all,
>> >> >> > > > >> > >
>> >> >> > > > >> > > It is currently impossible to enable state
>> checkpointing
>> >> for
>> >> >> > > > iterative
>> >> >> > > > >> > > jobs, because en exception is thrown when creating the
>> >> >> jobgraph.
>> >> >> > > > This
>> >> >> > > > >> > > behaviour is motivated by the lack of precise
>> guarantees
>> >> that
>> >> >> we
>> >> >> > > can
>> >> >> > > > >> give
>> >> >> > > > >> > > with the current fault-tolerance implementations for
>> cyclic
>> >> >> > > graphs.
>> >> >> > > > >> > >
>> >> >> > > > >> > > This PR <https://github.com/apache/flink/pull/812>
>> adds an
>> >> >> > > optional
>> >> >> > > > >> > flag to
>> >> >> > > > >> > > force checkpoints even in case of iterations. The
>> algorithm
>> >> >> will
>> >> >> > > > take
>> >> >> > > > >> > > checkpoints periodically as before, but records in
>> transit
>> >> >> > inside
>> >> >> > > > the
>> >> >> > > > >> > loop
>> >> >> > > > >> > > will be lost.
>> >> >> > > > >> > >
>> >> >> > > > >> > > However even this guarantee is enough for most
>> applications
>> >> >> > > (Machine
>> >> >> > > > >> > > Learning for instance) and certainly much better than
>> not
>> >> >> having
>> >> >> > > > >> anything
>> >> >> > > > >> > > at all.
>> >> >> > > > >> > >
>> >> >> > > > >> > >
>> >> >> > > > >> > > I suggest we add this to the 0.9 release as currently
>> many
>> >> >> > > > applications
>> >> >> > > > >> > > suffer from this limitation (SAMOA, ML pipelines, graph
>> >> >> > streaming
>> >> >> > > > etc.)
>> >> >> > > > >> > >
>> >> >> > > > >> > >
>> >> >> > > > >> > > Cheers,
>> >> >> > > > >> > >
>> >> >> > > > >> > > Gyula
>> >> >> > > > >> >
>> >> >> > > > >> >
>> >> >> > > > >>
>> >> >> > > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>
>

Reply via email to