Re: Stale Synchronous Parallel iterations in Flink

Kostas Tzoumas Mon, 23 Feb 2015 01:13:41 -0800

Hi Nam-Luc,

Several of your observations in the blog post are to the point. Iterations
are already pipelined, and the distributed state that the delta iterations
access can be possibly lifted to a parameter server API.

We need to work a bit through the details on how fault tolerance and
termination would work and how the recent refactoring of the runtime to
include intermediate results would play for this. This is feasible, but
will require runtime code.

Simply removing the dams from the iterations can make iterations
asynchronous, but introduces two issues:

(1) How to checkpoint such computations in order to recover them upon
failures.

(2) How to check for the termination of the computation without a sync
barrier. Can the SSP model help with that?

Kostas

On Fri, Feb 20, 2015 at 5:27 PM, Nam-Luc Tran <[email protected]>
wrote:

> Hello Everyone,
>
> I am Nam-Luc Tran, research Engineer at EURA NOVA [1]. Our research
> subjects cover distributed machine learning and we have been working
> on dataflow graph processing for a while now. We have been reading
> from you since Stratosphere :-)
>
> Our current research focuses on Stale Synchronous Parallelism and we
> are currently considering Apache Flink as a good candidate for
> implementing and delivering the best results among the existing
> processing solutions. I have written a post about it here:
>
> https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran
>
>
> What do you guys think about the approach? Does it seem feasible, or
> do you have anything similar in your roadmap?
>
> Best regards,
>
> Tran Nam-Luc
>
>
>
> Links:
> ------
> [1] http://euranova.eu
>
>

Re: Stale Synchronous Parallel iterations in Flink

Reply via email to