Re: [DISCUSS] FLIP-14: Loops API and Termination

Fabian Hueske Thu, 17 Nov 2016 04:59:56 -0800

Hi Paris,

just gave you the permissions (I hope).
Let me know if something does not work.


Cheers, Fabian

2016-11-17 13:48 GMT+01:00 Paris Carbone <par...@kth.se>:

> We do not have to schedule this for an early Flink release, just saying.
> I would just like to get the changes out and you people can review it and
> integrate it anytime at your own pace.
>
> Who is the admin of the wiki? It would be nice to get write access.
>
> > On 17 Nov 2016, at 13:45, Paris Carbone <par...@kth.se> wrote:
> >
> > Sounds like a plan!
> >
> > Can someone grant me access to write in the wiki please?
> > My username is “senorcarbone”.
> >
> > Paris
> >
> >> On 16 Nov 2016, at 14:30, Gyula Fóra <gyula.f...@gmail.com> wrote:
> >>
> >> I am not completely sure whether we should deprecate the old API for
> 1.2 or
> >> remove it completely. Personally I am in favor of removing it, I don't
> >> think it is a huge burden to move to the new one if it makes for a much
> >> nicer user experience.
> >>
> >> I think you can go ahead add the FLIP to the wiki and open the PR so we
> can
> >> start the review if you have it ready anyways.
> >>
> >> Gyula
> >>
> >> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 16., Sze,
> >> 11:55):
> >>
> >>> Thanks for reviewing, Gyula.
> >>>
> >>> One thing that is still up to discussion is whether we should remove
> >>> completely the old iterations API or simply mark it as deprecated till
> v2.0.
> >>> Also, not sure what is the best process now. We have the changes ready.
> >>> Should I copy the FLIP to the wiki and trigger the PRs or wait for a
> few
> >>> more days in case someone has objections?
> >>>
> >>> @Stephan, what is your take on our interpretation of the approach you
> >>> suggested? Should we proceed or is there anything that you do not find
> nice?
> >>>
> >>> Paris
> >>>
> >>>> On 15 Nov 2016, at 10:01, Gyula Fóra <gyf...@apache.org> wrote:
> >>>>
> >>>> Hi Paris,
> >>>>
> >>>> I like the proposed changes to the iteration API, this cleans up
> things
> >>> in
> >>>> the Java API without any strict restriction I think (it was never a
> >>> problem
> >>>> in the Scala API).
> >>>>
> >>>> The termination algorithm based on the proposed scoped loops seems to
> be
> >>>> fairly simple and looks good :)
> >>>>
> >>>> Cheers,
> >>>> Gyula
> >>>>
> >>>> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 14., H,
> >>> 8:50):
> >>>>
> >>>>> That would be great Shi! Let's take that offline.
> >>>>>
> >>>>> Anyone else interested in the iteration changes? It would be nice to
> >>>>> incorporate these to v1.2 if possible so I count on your review asap.
> >>>>>
> >>>>> cheers,
> >>>>> Paris
> >>>>>
> >>>>> On Nov 14, 2016, at 2:59 AM, xiaogang.sxg <
> xiaogang....@alibaba-inc.com
> >>>>> <mailto:xiaogang....@alibaba-inc.com>> wrote:
> >>>>>
> >>>>> Hi Paris
> >>>>>
> >>>>> Unfortunately, the project is not public yet.
> >>>>> But i can provide you a primitive implementation of the update
> protocol
> >>> in
> >>>>> the paper. It’s implemented in Storm. Since the protocol assumes the
> >>>>> communication channels between different tasks are dual, i think it’s
> >>> not
> >>>>> easy to adapt it to Flink.
> >>>>>
> >>>>> Regards
> >>>>> Xiaogang
> >>>>>
> >>>>>
> >>>>> 在 2016年11月12日，上午3:03，Paris Carbone <par...@kth.se<mailto:parisc@
> kth.se
> >>>>>
> >>>>> 写道：
> >>>>>
> >>>>> Hi Shi,
> >>>>>
> >>>>> Naiad/Timely Dataflow and other projects use global coordination
> which
> >>> is
> >>>>> very convenient for asynchronous progress tracking in general but it
> has
> >>>>> some downsides in a production systems that count on in-flight
> >>>>> transactional control mechanisms and rollback recovery guarantees.
> This
> >>> is
> >>>>> why we generally prefer decentralized approaches (despite their our
> >>>>> downsides).
> >>>>>
> >>>>> Regarding synchronous/structured iterations, this is a bit off topic
> and
> >>>>> they are a bit of a different story as you already know.
> >>>>> We maintain a graph streaming (gelly-streams) library on Flink that
> you
> >>>>> might find interesting [1]. Vasia, another Flink committer is also
> >>> working
> >>>>> on that among others.
> >>>>> You can keep an eye on it since we are planning to use this project
> as a
> >>>>> showcase for a new way of doing structured and fixpoint iterations on
> >>>>> streams in the future.
> >>>>>
> >>>>> P.S. many thanks for sharing your publication, it was an interesting
> >>> read.
> >>>>> Do you happen to have your source code public? We could most
> certainly
> >>> use
> >>>>> it in an benchmark soon.
> >>>>>
> >>>>> [1] https://github.com/vasia/gelly-streaming
> >>>>>
> >>>>>
> >>>>> On 11 Nov 2016, at 19:18, SHI Xiaogang <shixiaoga...@gmail.com<
> mailto:
> >>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>> wrote:
> >>>>>
> >>>>> Hi, Fouad
> >>>>>
> >>>>> Thank you for the explanation. Now the centralized method seems
> correct
> >>> to
> >>>>> me.
> >>>>> The passing of StatusUpdate events will lead to synchronous
> iterations
> >>> and
> >>>>> we are using the information in each iterations to terminate the
> >>>>> computation.
> >>>>>
> >>>>> Actually, i prefer the centralized method because in many
> applications,
> >>> the
> >>>>> convergence may depend on some global statistics.
> >>>>> For example, a PageRank program may terminate the computation when
> 99%
> >>>>> vertices are converged.
> >>>>> I think those learning programs which cannot reach the fixed-point
> >>>>> (oscillating around the fixed-point) can benefit a lot from such
> >>> features.
> >>>>> The decentralized method makes it hard to support such convergence
> >>>>> conditions.
> >>>>>
> >>>>>
> >>>>> Another concern is that Flink cannot produce periodical results in
> the
> >>>>> iteration over infinite data streams.
> >>>>> Take a concrete example. Given an edge stream constructing a graph,
> the
> >>>>> user may need the PageRank weight of each vertex in the graphs
> formed at
> >>>>> certain instants.
> >>>>> Currently Flink does not provide any input or iteration information
> to
> >>>>> users, making users hard to implement such real-time iterative
> >>>>> applications.
> >>>>> Such features are supported in both Naiad and Tornado. I think Flink
> >>> should
> >>>>> support it as well.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> Regards
> >>>>> Xiaogang
> >>>>>
> >>>>>
> >>>>> 2016-11-11 19:27 GMT+08:00 Fouad ALi <fouad.alsay...@gmail.com<
> mailto:
> >>>>> fouad.alsay...@gmail.com><mailto:fouad.alsay...@gmail.com>>:
> >>>>>
> >>>>> Hi Shi,
> >>>>>
> >>>>> It seems that you are referring to the centralized algorithm which
> is no
> >>>>> longer the proposed version.
> >>>>> In the decentralized version (check last doc) there is no master
> node or
> >>>>> global coordination involved.
> >>>>>
> >>>>> Let us keep this discussion to the decentralized one if possible.
> >>>>>
> >>>>> To answer your points on the previous approach, there is a catch in
> your
> >>>>> trace at t7. Here is what is happening :
> >>>>> - Head,as well as RS, will receive  a 'BroadcastStatusUpdate' from
> >>>>> runtime (see 2.1 in the steps).
> >>>>> - RS and Heads will broadcast StatusUpdate  event and will not notify
> >>> its
> >>>>> status.
> >>>>> - When StatusUpdate event gets back to the head it will notify its
> >>>>> WORKING  status.
> >>>>>
> >>>>> Hope that answers your concern.
> >>>>>
> >>>>> Best,
> >>>>> Fouad
> >>>>>
> >>>>> On Nov 11, 2016, at 6:21 AM, SHI Xiaogang <shixiaoga...@gmail.com
> >>> <mailto:
> >>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>>
> >>>>> wrote:
> >>>>>
> >>>>> Hi Paris
> >>>>>
> >>>>> I have several concerns about the correctness of the termination
> >>>>> protocol.
> >>>>> I think the termination protocol put an end to the computation even
> when
> >>>>> the computation has not converged.
> >>>>>
> >>>>> Suppose there exists a loop context constructed by a OP operator, a
> Head
> >>>>> operator and a Tail operator (illustrated in Figure 2 in the first
> >>>>> draft).
> >>>>> The stream only contains one record. OP will pass the record to its
> >>>>> downstream operators 10 times. In other words, the loop should
> iterate
> >>> 10
> >>>>> times.
> >>>>>
> >>>>> If I understood the protocol correctly, the following event sequence
> may
> >>>>> happen in the computation:
> >>>>> t1:  RS emits Record to OP. Since RS has reached the "end-of-stream",
> >>> the
> >>>>> system enters into Speculative Phase.
> >>>>> t2:  OP receives Record and emits it to TAIL.
> >>>>> t3:  HEAD receives the UpdateStatus event, and notifies with an IDLE
> >>>>> state.
> >>>>> t4. OP receives the UpdateStatus event from HEAD, and notifies with
> an
> >>>>> WORKING state.
> >>>>> t5. TAIL receives Record and emits it to HEAD.
> >>>>> t6. TAIL receives the UpdateStatus event from OP, and notifies with
> an
> >>>>> WORKING state.
> >>>>> t7. The system starts a new attempt. HEAD receives the UpdateStatus
> >>> event
> >>>>> and notifies with an IDLE state.  (Record is still in transition.)
> >>>>> t8. OP receives the UpdateStatus event from HEAD and notifies with an
> >>>>> IDLE
> >>>>> state.
> >>>>> t9. TAIL receives the UpdateStatus event from OP and notifies with an
> >>>>> IDLE
> >>>>> state.
> >>>>> t10. HEAD receives Record from TAIL and emits it to OP.
> >>>>> t11. System puts an end to the computation.
> >>>>>
> >>>>> Though the computation is expected to iterate 10 times, it ends
> earlier.
> >>>>> The cause is that the communication channels of MASTER=>HEAD and
> >>>>> TAIL=>HEAD
> >>>>> are not synchronized.
> >>>>>
> >>>>> I think the protocol follows the idea of the Chandy-Lamport
> algorithm to
> >>>>> determine a global state.
> >>>>> But the information of whether a node has processed any record to
> since
> >>>>> the
> >>>>> last request is not STABLE.
> >>>>> Hence i doubt the correctness of the protocol.
> >>>>>
> >>>>> To determine the termination correctly, we need some information
> that is
> >>>>> stable.
> >>>>> In timelyflow, Naiad collects the progress made in each iteration and
> >>>>> terminates the loop when a little progress is made in an iteration
> >>>>> (identified by the timestamp vector).
> >>>>> The information is stable because the result of an iteration cannot
> be
> >>>>> changed by the execution of later iterations.
> >>>>>
> >>>>> A similar method is also adopted in Tornado.
> >>>>> You may see my paper for more details about the termination of loops:
> >>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf <
> >>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf>
> >>>>>
> >>>>> Regards
> >>>>> Xiaogang
> >>>>>
> >>>>> 2016-11-11 3:19 GMT+08:00 Paris Carbone <par...@kth.se<mailto:
> >>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
> >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>>>:
> >>>>>
> >>>>> Hi again Flink folks,
> >>>>>
> >>>>> Here is our new proposal that addresses Job Termination - the loop
> fault
> >>>>> tolerance proposal will follow shortly.
> >>>>> As Stephan hinted, we need operators to be aware of their scope
> level.
> >>>>>
> >>>>> Thus, it is time we make loops great again! :)
> >>>>>
> >>>>> Part of this FLIP basically introduces a new functional,
> compositional
> >>>>> API
> >>>>> for defining asynchronous loops for DataStreams.
> >>>>> This is coupled with a decentralized algorithm for job termination
> with
> >>>>> loops - along the lines of what Stephan described.
> >>>>> We are already working on the actual prototypes as you can observe in
> >>>>> the
> >>>>> links of the doc.
> >>>>>
> >>>>> Please let us know if you like (or don't like) it and why, in this
> mail
> >>>>> discussion.
> >>>>>
> >>>>> https://docs.google.com/document/d/1nzTlae0AFimPCTIV1LB3Z2y-
> >>>>> PfTHtq3173EhsAkpBoQ
> >>>>>
> >>>>> cheers
> >>>>> Paris and Fouad
> >>>>>
> >>>>> On 31 Oct 2016, at 12:53, Paris Carbone <par...@kth.se<mailto:
> >>>>> par...@kth.se> <mailto:
> >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
> >>> parisc@
> >>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> <
> http://kth.se/
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>> Hey Stephan,
> >>>>>
> >>>>> Thanks for looking into it!
> >>>>>
> >>>>> +1 for breaking this up, will do that.
> >>>>>
> >>>>> I can see your point and maybe it makes sense to introduce part of
> >>>>> scoping
> >>>>> to incorporate support for nested loops (otherwise it can’t work).
> >>>>> Let us think about this a bit. We will share another draft for a more
> >>>>> detail description of the approach you are suggesting asap.
> >>>>>
> >>>>>
> >>>>> On 27 Oct 2016, at 10:55, Stephan Ewen <se...@apache.org<mailto:
> >>>>> se...@apache.org><mailto:se...@apache.org> <mailto:
> >>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org
> >>>>>>> <mailto:sewen
> >>>>> @apache.org<http://apache.org/>>> wrote:
> >>>>>
> >>>>> How about we break this up into two FLIPs? There are after all two
> >>>>> orthogonal problems (termination, fault tolerance) with quite
> different
> >>>>> discussion states.
> >>>>>
> >>>>> Concerning fault tolerance, I like the ideas.
> >>>>> For the termination proposal, I would like to iterate a bit more.
> >>>>>
> >>>>> *Termination algorithm:*
> >>>>>
> >>>>> My main concern here is the introduction of a termination coordinator
> >>>>> and
> >>>>> any involvement of RPC messages when deciding termination.
> >>>>> That would be such a fundamental break with the current runtime
> >>>>> architecture, and it would make the currently very elegant and simple
> >>>>> model
> >>>>> much more complicated and harder to maintain. Given that Flink's
> >>>>> runtime is
> >>>>> complex enough, I would really like to avoid that.
> >>>>>
> >>>>> The current runtime paradigm coordinates between operators strictly
> via
> >>>>> in-band events. RPC calls happen between operators and the master for
> >>>>> triggering and acknowledging execution and checkpoints.
> >>>>>
> >>>>> I was wondering whether we can keep following that paradigm and still
> >>>>> get
> >>>>> most of what you are proposing here. In some sense, all we need to
> do is
> >>>>> replace RPC calls with in-band events, and "decentralize" the
> >>>>> coordinator
> >>>>> such that every operator can make its own termination decision by
> >>>>> itself.
> >>>>>
> >>>>> This is only a rough sketch, you probably need to flesh it out more.
> >>>>>
> >>>>> - I assume that the OP in the diagram knows that it is in a loop and
> >>>>> that
> >>>>> it is the one connected to the head and tail
> >>>>>
> >>>>> - When OP receives and EndOfStream Event from the regular source
> (RS),
> >>>>> it
> >>>>> emits an "AttemptTermination" event downstream to the operators
> >>>>> involved in
> >>>>> the loop. It attaches an attempt sequence number and memorizes that
> >>>>> - Tail and Head forward these events
> >>>>> - When OP receives the event back with the same attempt sequence
> number,
> >>>>> and no records came in the meantime, it shuts down and emits
> EndOfStream
> >>>>> downstream
> >>>>> - When other records came back between emitting the
> AttemptTermination
> >>>>> event and receiving it back, then it emits a new AttemptTermination
> >>>>> event
> >>>>> with the next sequence number.
> >>>>> - This should terminate as soon as the loop is empty.
> >>>>>
> >>>>> Might this model even generalize to nested loops, where the
> >>>>> "AttemptTermination" event is scoped by the loop's nesting level?
> >>>>>
> >>>>> Let me know what you think!
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Stephan
> >>>>>
> >>>>>
> >>>>> On Thu, Oct 27, 2016 at 10:19 AM, Stephan Ewen <se...@apache.org
> >>> <mailto:
> >>>>> se...@apache.org><mailto:se...@apache.org>
> >>>>> <mailto:se...@apache.org><mailto:
> >>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org>
> >>>>> <mailto:se...@apache.org>>> wrote:
> >>>>>
> >>>>> Hi!
> >>>>>
> >>>>> I am still scanning it and compiling some comments. Give me a bit ;-)
> >>>>>
> >>>>> Stephan
> >>>>>
> >>>>>
> >>>>> On Wed, Oct 26, 2016 at 6:13 PM, Paris Carbone <par...@kth.se
> <mailto:
> >>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
> >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
> >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se> <mailto:
> >>>>> par...@kth.se>>> wrote:
> >>>>>
> >>>>> Hey all,
> >>>>>
> >>>>> Now that many of you have already scanned the document (judging from
> the
> >>>>> views) maybe it is time to give back some feedback!
> >>>>> Did you like it? Would you suggest an improvement?
> >>>>>
> >>>>> I would suggest not to leave this in the void. It has to do with
> >>>>> important properties that the system promises to provide.
> >>>>> Me and Fouad will do our best to answer your questions and discuss
> this
> >>>>> further.
> >>>>>
> >>>>> cheers
> >>>>> Paris
> >>>>>
> >>>>> On 21 Oct 2016, at 08:54, Paris Carbone <par...@kth.se<mailto:
> >>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
> >>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
> >>> parisc@
> >>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> <
> http://kth.se/
> >>>>>>> <mailto:parisc@k
> >>>>> th.se<http://th.se/><http://th.se<http://th.se/>> <http://th.se/><
> >>>>> http://th.se<http://th.se/> <http://th.se/>>>> wrote:
> >>>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> Loops in Apache Flink have a good potential to become a much more
> >>>>> powerful thing in future version of Apache Flink.
> >>>>> There is generally high demand to make them usable and first of all
> >>>>> production-ready for upcoming releases.
> >>>>>
> >>>>> As a first commitment we would like to propose FLIP-13 for consistent
> >>>>> processing with Loops.
> >>>>> We are also working on scoped loops for Q1 2017 which we can share if
> >>>>> there is enough interest.
> >>>>>
> >>>>> For now, that is an improvement proposal that solves two pending
> major
> >>>>> issues:
> >>>>>
> >>>>> 1) The (not so trivial) problem of correct termination of jobs with
> >>>>> iterations
> >>>>> 2) The applicability of the checkpointing algorithm to iterative
> >>>>> dataflow
> >>>>> graphs.
> >>>>>
> >>>>> We would really appreciate it if you go through the linked draft
> >>>>> (motivation and proposed changes) for FLIP-13 and point out comments,
> >>>>> preferably publicly in this devlist discussion before we go ahead and
> >>>>> update the wiki.
> >>>>>
> >>>>> https://docs.google.com/document/d/1M6ERj-TzlykMLHzPSwW5L9b0
> >>>>> BhDbtoYucmByBjRBISs/edit?usp=sharing
> >>>>>
> >>>>> cheers
> >>>>>
> >>>>> Paris and Fouad
> >>>>>
> >>>>>
> >>>
> >>>
> >
>
>

Re: [DISCUSS] FLIP-14: Loops API and Termination

Reply via email to