Re: [DISCUSS] FLIP-14: Loops API and Termination

Paris Carbone Thu, 17 Nov 2016 05:01:28 -0800
That was fast!  Seems to be working. 
Thank you Fabian! 

> On 17 Nov 2016, at 13:58, Fabian Hueske <fhue...@gmail.com> wrote:
> 
> Hi Paris,
> 
> just gave you the permissions (I hope).
> Let me know if something does not work.
> 
> Cheers, Fabian
> 
> 2016-11-17 13:48 GMT+01:00 Paris Carbone <par...@kth.se>:
> 
>> We do not have to schedule this for an early Flink release, just saying.
>> I would just like to get the changes out and you people can review it and
>> integrate it anytime at your own pace.
>> 
>> Who is the admin of the wiki? It would be nice to get write access.
>> 
>>> On 17 Nov 2016, at 13:45, Paris Carbone <par...@kth.se> wrote:
>>> 
>>> Sounds like a plan!
>>> 
>>> Can someone grant me access to write in the wiki please?
>>> My username is “senorcarbone”.
>>> 
>>> Paris
>>> 
>>>> On 16 Nov 2016, at 14:30, Gyula Fóra <gyula.f...@gmail.com> wrote:
>>>> 
>>>> I am not completely sure whether we should deprecate the old API for
>> 1.2 or
>>>> remove it completely. Personally I am in favor of removing it, I don't
>>>> think it is a huge burden to move to the new one if it makes for a much
>>>> nicer user experience.
>>>> 
>>>> I think you can go ahead add the FLIP to the wiki and open the PR so we
>> can
>>>> start the review if you have it ready anyways.
>>>> 
>>>> Gyula
>>>> 
>>>> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 16., Sze,
>>>> 11:55):
>>>> 
>>>>> Thanks for reviewing, Gyula.
>>>>> 
>>>>> One thing that is still up to discussion is whether we should remove
>>>>> completely the old iterations API or simply mark it as deprecated till
>> v2.0.
>>>>> Also, not sure what is the best process now. We have the changes ready.
>>>>> Should I copy the FLIP to the wiki and trigger the PRs or wait for a
>> few
>>>>> more days in case someone has objections?
>>>>> 
>>>>> @Stephan, what is your take on our interpretation of the approach you
>>>>> suggested? Should we proceed or is there anything that you do not find
>> nice?
>>>>> 
>>>>> Paris
>>>>> 
>>>>>> On 15 Nov 2016, at 10:01, Gyula Fóra <gyf...@apache.org> wrote:
>>>>>> 
>>>>>> Hi Paris,
>>>>>> 
>>>>>> I like the proposed changes to the iteration API, this cleans up
>> things
>>>>> in
>>>>>> the Java API without any strict restriction I think (it was never a
>>>>> problem
>>>>>> in the Scala API).
>>>>>> 
>>>>>> The termination algorithm based on the proposed scoped loops seems to
>> be
>>>>>> fairly simple and looks good :)
>>>>>> 
>>>>>> Cheers,
>>>>>> Gyula
>>>>>> 
>>>>>> Paris Carbone <par...@kth.se> ezt írta (időpont: 2016. nov. 14., H,
>>>>> 8:50):
>>>>>> 
>>>>>>> That would be great Shi! Let's take that offline.
>>>>>>> 
>>>>>>> Anyone else interested in the iteration changes? It would be nice to
>>>>>>> incorporate these to v1.2 if possible so I count on your review asap.
>>>>>>> 
>>>>>>> cheers,
>>>>>>> Paris
>>>>>>> 
>>>>>>> On Nov 14, 2016, at 2:59 AM, xiaogang.sxg <
>> xiaogang....@alibaba-inc.com
>>>>>>> <mailto:xiaogang....@alibaba-inc.com>> wrote:
>>>>>>> 
>>>>>>> Hi Paris
>>>>>>> 
>>>>>>> Unfortunately, the project is not public yet.
>>>>>>> But i can provide you a primitive implementation of the update
>> protocol
>>>>> in
>>>>>>> the paper. It’s implemented in Storm. Since the protocol assumes the
>>>>>>> communication channels between different tasks are dual, i think it’s
>>>>> not
>>>>>>> easy to adapt it to Flink.
>>>>>>> 
>>>>>>> Regards
>>>>>>> Xiaogang
>>>>>>> 
>>>>>>> 
>>>>>>> 在 2016年11月12日，上午3:03，Paris Carbone <par...@kth.se<mailto:parisc@
>> kth.se
>>>>>>> 
>>>>>>> 写道：
>>>>>>> 
>>>>>>> Hi Shi,
>>>>>>> 
>>>>>>> Naiad/Timely Dataflow and other projects use global coordination
>> which
>>>>> is
>>>>>>> very convenient for asynchronous progress tracking in general but it
>> has
>>>>>>> some downsides in a production systems that count on in-flight
>>>>>>> transactional control mechanisms and rollback recovery guarantees.
>> This
>>>>> is
>>>>>>> why we generally prefer decentralized approaches (despite their our
>>>>>>> downsides).
>>>>>>> 
>>>>>>> Regarding synchronous/structured iterations, this is a bit off topic
>> and
>>>>>>> they are a bit of a different story as you already know.
>>>>>>> We maintain a graph streaming (gelly-streams) library on Flink that
>> you
>>>>>>> might find interesting [1]. Vasia, another Flink committer is also
>>>>> working
>>>>>>> on that among others.
>>>>>>> You can keep an eye on it since we are planning to use this project
>> as a
>>>>>>> showcase for a new way of doing structured and fixpoint iterations on
>>>>>>> streams in the future.
>>>>>>> 
>>>>>>> P.S. many thanks for sharing your publication, it was an interesting
>>>>> read.
>>>>>>> Do you happen to have your source code public? We could most
>> certainly
>>>>> use
>>>>>>> it in an benchmark soon.
>>>>>>> 
>>>>>>> [1] https://github.com/vasia/gelly-streaming
>>>>>>> 
>>>>>>> 
>>>>>>> On 11 Nov 2016, at 19:18, SHI Xiaogang <shixiaoga...@gmail.com<
>> mailto:
>>>>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Hi, Fouad
>>>>>>> 
>>>>>>> Thank you for the explanation. Now the centralized method seems
>> correct
>>>>> to
>>>>>>> me.
>>>>>>> The passing of StatusUpdate events will lead to synchronous
>> iterations
>>>>> and
>>>>>>> we are using the information in each iterations to terminate the
>>>>>>> computation.
>>>>>>> 
>>>>>>> Actually, i prefer the centralized method because in many
>> applications,
>>>>> the
>>>>>>> convergence may depend on some global statistics.
>>>>>>> For example, a PageRank program may terminate the computation when
>> 99%
>>>>>>> vertices are converged.
>>>>>>> I think those learning programs which cannot reach the fixed-point
>>>>>>> (oscillating around the fixed-point) can benefit a lot from such
>>>>> features.
>>>>>>> The decentralized method makes it hard to support such convergence
>>>>>>> conditions.
>>>>>>> 
>>>>>>> 
>>>>>>> Another concern is that Flink cannot produce periodical results in
>> the
>>>>>>> iteration over infinite data streams.
>>>>>>> Take a concrete example. Given an edge stream constructing a graph,
>> the
>>>>>>> user may need the PageRank weight of each vertex in the graphs
>> formed at
>>>>>>> certain instants.
>>>>>>> Currently Flink does not provide any input or iteration information
>> to
>>>>>>> users, making users hard to implement such real-time iterative
>>>>>>> applications.
>>>>>>> Such features are supported in both Naiad and Tornado. I think Flink
>>>>> should
>>>>>>> support it as well.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Regards
>>>>>>> Xiaogang
>>>>>>> 
>>>>>>> 
>>>>>>> 2016-11-11 19:27 GMT+08:00 Fouad ALi <fouad.alsay...@gmail.com<
>> mailto:
>>>>>>> fouad.alsay...@gmail.com><mailto:fouad.alsay...@gmail.com>>:
>>>>>>> 
>>>>>>> Hi Shi,
>>>>>>> 
>>>>>>> It seems that you are referring to the centralized algorithm which
>> is no
>>>>>>> longer the proposed version.
>>>>>>> In the decentralized version (check last doc) there is no master
>> node or
>>>>>>> global coordination involved.
>>>>>>> 
>>>>>>> Let us keep this discussion to the decentralized one if possible.
>>>>>>> 
>>>>>>> To answer your points on the previous approach, there is a catch in
>> your
>>>>>>> trace at t7. Here is what is happening :
>>>>>>> - Head,as well as RS, will receive  a 'BroadcastStatusUpdate' from
>>>>>>> runtime (see 2.1 in the steps).
>>>>>>> - RS and Heads will broadcast StatusUpdate  event and will not notify
>>>>> its
>>>>>>> status.
>>>>>>> - When StatusUpdate event gets back to the head it will notify its
>>>>>>> WORKING  status.
>>>>>>> 
>>>>>>> Hope that answers your concern.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Fouad
>>>>>>> 
>>>>>>> On Nov 11, 2016, at 6:21 AM, SHI Xiaogang <shixiaoga...@gmail.com
>>>>> <mailto:
>>>>>>> shixiaoga...@gmail.com><mailto:shixiaoga...@gmail.com>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Paris
>>>>>>> 
>>>>>>> I have several concerns about the correctness of the termination
>>>>>>> protocol.
>>>>>>> I think the termination protocol put an end to the computation even
>> when
>>>>>>> the computation has not converged.
>>>>>>> 
>>>>>>> Suppose there exists a loop context constructed by a OP operator, a
>> Head
>>>>>>> operator and a Tail operator (illustrated in Figure 2 in the first
>>>>>>> draft).
>>>>>>> The stream only contains one record. OP will pass the record to its
>>>>>>> downstream operators 10 times. In other words, the loop should
>> iterate
>>>>> 10
>>>>>>> times.
>>>>>>> 
>>>>>>> If I understood the protocol correctly, the following event sequence
>> may
>>>>>>> happen in the computation:
>>>>>>> t1:  RS emits Record to OP. Since RS has reached the "end-of-stream",
>>>>> the
>>>>>>> system enters into Speculative Phase.
>>>>>>> t2:  OP receives Record and emits it to TAIL.
>>>>>>> t3:  HEAD receives the UpdateStatus event, and notifies with an IDLE
>>>>>>> state.
>>>>>>> t4. OP receives the UpdateStatus event from HEAD, and notifies with
>> an
>>>>>>> WORKING state.
>>>>>>> t5. TAIL receives Record and emits it to HEAD.
>>>>>>> t6. TAIL receives the UpdateStatus event from OP, and notifies with
>> an
>>>>>>> WORKING state.
>>>>>>> t7. The system starts a new attempt. HEAD receives the UpdateStatus
>>>>> event
>>>>>>> and notifies with an IDLE state.  (Record is still in transition.)
>>>>>>> t8. OP receives the UpdateStatus event from HEAD and notifies with an
>>>>>>> IDLE
>>>>>>> state.
>>>>>>> t9. TAIL receives the UpdateStatus event from OP and notifies with an
>>>>>>> IDLE
>>>>>>> state.
>>>>>>> t10. HEAD receives Record from TAIL and emits it to OP.
>>>>>>> t11. System puts an end to the computation.
>>>>>>> 
>>>>>>> Though the computation is expected to iterate 10 times, it ends
>> earlier.
>>>>>>> The cause is that the communication channels of MASTER=>HEAD and
>>>>>>> TAIL=>HEAD
>>>>>>> are not synchronized.
>>>>>>> 
>>>>>>> I think the protocol follows the idea of the Chandy-Lamport
>> algorithm to
>>>>>>> determine a global state.
>>>>>>> But the information of whether a node has processed any record to
>> since
>>>>>>> the
>>>>>>> last request is not STABLE.
>>>>>>> Hence i doubt the correctness of the protocol.
>>>>>>> 
>>>>>>> To determine the termination correctly, we need some information
>> that is
>>>>>>> stable.
>>>>>>> In timelyflow, Naiad collects the progress made in each iteration and
>>>>>>> terminates the loop when a little progress is made in an iteration
>>>>>>> (identified by the timestamp vector).
>>>>>>> The information is stable because the result of an iteration cannot
>> be
>>>>>>> changed by the execution of later iterations.
>>>>>>> 
>>>>>>> A similar method is also adopted in Tornado.
>>>>>>> You may see my paper for more details about the termination of loops:
>>>>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf <
>>>>>>> http://net.pku.edu.cn/~cuibin/Papers/2016SIGMOD.pdf>
>>>>>>> 
>>>>>>> Regards
>>>>>>> Xiaogang
>>>>>>> 
>>>>>>> 2016-11-11 3:19 GMT+08:00 Paris Carbone <par...@kth.se<mailto:
>>>>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
>>>>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>>>:
>>>>>>> 
>>>>>>> Hi again Flink folks,
>>>>>>> 
>>>>>>> Here is our new proposal that addresses Job Termination - the loop
>> fault
>>>>>>> tolerance proposal will follow shortly.
>>>>>>> As Stephan hinted, we need operators to be aware of their scope
>> level.
>>>>>>> 
>>>>>>> Thus, it is time we make loops great again! :)
>>>>>>> 
>>>>>>> Part of this FLIP basically introduces a new functional,
>> compositional
>>>>>>> API
>>>>>>> for defining asynchronous loops for DataStreams.
>>>>>>> This is coupled with a decentralized algorithm for job termination
>> with
>>>>>>> loops - along the lines of what Stephan described.
>>>>>>> We are already working on the actual prototypes as you can observe in
>>>>>>> the
>>>>>>> links of the doc.
>>>>>>> 
>>>>>>> Please let us know if you like (or don't like) it and why, in this
>> mail
>>>>>>> discussion.
>>>>>>> 
>>>>>>> https://docs.google.com/document/d/1nzTlae0AFimPCTIV1LB3Z2y-
>>>>>>> PfTHtq3173EhsAkpBoQ
>>>>>>> 
>>>>>>> cheers
>>>>>>> Paris and Fouad
>>>>>>> 
>>>>>>> On 31 Oct 2016, at 12:53, Paris Carbone <par...@kth.se<mailto:
>>>>>>> par...@kth.se> <mailto:
>>>>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
>>>>> parisc@
>>>>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> <
>> http://kth.se/
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hey Stephan,
>>>>>>> 
>>>>>>> Thanks for looking into it!
>>>>>>> 
>>>>>>> +1 for breaking this up, will do that.
>>>>>>> 
>>>>>>> I can see your point and maybe it makes sense to introduce part of
>>>>>>> scoping
>>>>>>> to incorporate support for nested loops (otherwise it can’t work).
>>>>>>> Let us think about this a bit. We will share another draft for a more
>>>>>>> detail description of the approach you are suggesting asap.
>>>>>>> 
>>>>>>> 
>>>>>>> On 27 Oct 2016, at 10:55, Stephan Ewen <se...@apache.org<mailto:
>>>>>>> se...@apache.org><mailto:se...@apache.org> <mailto:
>>>>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org
>>>>>>>>> <mailto:sewen
>>>>>>> @apache.org<http://apache.org/>>> wrote:
>>>>>>> 
>>>>>>> How about we break this up into two FLIPs? There are after all two
>>>>>>> orthogonal problems (termination, fault tolerance) with quite
>> different
>>>>>>> discussion states.
>>>>>>> 
>>>>>>> Concerning fault tolerance, I like the ideas.
>>>>>>> For the termination proposal, I would like to iterate a bit more.
>>>>>>> 
>>>>>>> *Termination algorithm:*
>>>>>>> 
>>>>>>> My main concern here is the introduction of a termination coordinator
>>>>>>> and
>>>>>>> any involvement of RPC messages when deciding termination.
>>>>>>> That would be such a fundamental break with the current runtime
>>>>>>> architecture, and it would make the currently very elegant and simple
>>>>>>> model
>>>>>>> much more complicated and harder to maintain. Given that Flink's
>>>>>>> runtime is
>>>>>>> complex enough, I would really like to avoid that.
>>>>>>> 
>>>>>>> The current runtime paradigm coordinates between operators strictly
>> via
>>>>>>> in-band events. RPC calls happen between operators and the master for
>>>>>>> triggering and acknowledging execution and checkpoints.
>>>>>>> 
>>>>>>> I was wondering whether we can keep following that paradigm and still
>>>>>>> get
>>>>>>> most of what you are proposing here. In some sense, all we need to
>> do is
>>>>>>> replace RPC calls with in-band events, and "decentralize" the
>>>>>>> coordinator
>>>>>>> such that every operator can make its own termination decision by
>>>>>>> itself.
>>>>>>> 
>>>>>>> This is only a rough sketch, you probably need to flesh it out more.
>>>>>>> 
>>>>>>> - I assume that the OP in the diagram knows that it is in a loop and
>>>>>>> that
>>>>>>> it is the one connected to the head and tail
>>>>>>> 
>>>>>>> - When OP receives and EndOfStream Event from the regular source
>> (RS),
>>>>>>> it
>>>>>>> emits an "AttemptTermination" event downstream to the operators
>>>>>>> involved in
>>>>>>> the loop. It attaches an attempt sequence number and memorizes that
>>>>>>> - Tail and Head forward these events
>>>>>>> - When OP receives the event back with the same attempt sequence
>> number,
>>>>>>> and no records came in the meantime, it shuts down and emits
>> EndOfStream
>>>>>>> downstream
>>>>>>> - When other records came back between emitting the
>> AttemptTermination
>>>>>>> event and receiving it back, then it emits a new AttemptTermination
>>>>>>> event
>>>>>>> with the next sequence number.
>>>>>>> - This should terminate as soon as the loop is empty.
>>>>>>> 
>>>>>>> Might this model even generalize to nested loops, where the
>>>>>>> "AttemptTermination" event is scoped by the loop's nesting level?
>>>>>>> 
>>>>>>> Let me know what you think!
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Oct 27, 2016 at 10:19 AM, Stephan Ewen <se...@apache.org
>>>>> <mailto:
>>>>>>> se...@apache.org><mailto:se...@apache.org>
>>>>>>> <mailto:se...@apache.org><mailto:
>>>>>>> se...@apache.org<mailto:se...@apache.org><mailto:se...@apache.org>
>>>>>>> <mailto:se...@apache.org>>> wrote:
>>>>>>> 
>>>>>>> Hi!
>>>>>>> 
>>>>>>> I am still scanning it and compiling some comments. Give me a bit ;-)
>>>>>>> 
>>>>>>> Stephan
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Oct 26, 2016 at 6:13 PM, Paris Carbone <par...@kth.se
>> <mailto:
>>>>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
>>>>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
>>>>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se> <mailto:
>>>>>>> par...@kth.se>>> wrote:
>>>>>>> 
>>>>>>> Hey all,
>>>>>>> 
>>>>>>> Now that many of you have already scanned the document (judging from
>> the
>>>>>>> views) maybe it is time to give back some feedback!
>>>>>>> Did you like it? Would you suggest an improvement?
>>>>>>> 
>>>>>>> I would suggest not to leave this in the void. It has to do with
>>>>>>> important properties that the system promises to provide.
>>>>>>> Me and Fouad will do our best to answer your questions and discuss
>> this
>>>>>>> further.
>>>>>>> 
>>>>>>> cheers
>>>>>>> Paris
>>>>>>> 
>>>>>>> On 21 Oct 2016, at 08:54, Paris Carbone <par...@kth.se<mailto:
>>>>>>> par...@kth.se><mailto:par...@kth.se> <mailto:
>>>>>>> par...@kth.se<mailto:par...@kth.se><mailto:par...@kth.se>><mailto:
>>>>> parisc@
>>>>>>> kth.se<http://kth.se/><http://kth.se<http://kth.se/>> <
>> http://kth.se/
>>>>>>>>> <mailto:parisc@k
>>>>>>> th.se<http://th.se/><http://th.se<http://th.se/>> <http://th.se/><
>>>>>>> http://th.se<http://th.se/> <http://th.se/>>>> wrote:
>>>>>>> 
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> Loops in Apache Flink have a good potential to become a much more
>>>>>>> powerful thing in future version of Apache Flink.
>>>>>>> There is generally high demand to make them usable and first of all
>>>>>>> production-ready for upcoming releases.
>>>>>>> 
>>>>>>> As a first commitment we would like to propose FLIP-13 for consistent
>>>>>>> processing with Loops.
>>>>>>> We are also working on scoped loops for Q1 2017 which we can share if
>>>>>>> there is enough interest.
>>>>>>> 
>>>>>>> For now, that is an improvement proposal that solves two pending
>> major
>>>>>>> issues:
>>>>>>> 
>>>>>>> 1) The (not so trivial) problem of correct termination of jobs with
>>>>>>> iterations
>>>>>>> 2) The applicability of the checkpointing algorithm to iterative
>>>>>>> dataflow
>>>>>>> graphs.
>>>>>>> 
>>>>>>> We would really appreciate it if you go through the linked draft
>>>>>>> (motivation and proposed changes) for FLIP-13 and point out comments,
>>>>>>> preferably publicly in this devlist discussion before we go ahead and
>>>>>>> update the wiki.
>>>>>>> 
>>>>>>> https://docs.google.com/document/d/1M6ERj-TzlykMLHzPSwW5L9b0
>>>>>>> BhDbtoYucmByBjRBISs/edit?usp=sharing
>>>>>>> 
>>>>>>> cheers
>>>>>>> 
>>>>>>> Paris and Fouad
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>>
Re: [DISCUSS] FLIP-14: Loops API and Termination

Reply via email to