Hi Nam-Luc, Several of your observations in the blog post are to the point. Iterations are already pipelined, and the distributed state that the delta iterations access can be possibly lifted to a parameter server API.
We need to work a bit through the details on how fault tolerance and termination would work and how the recent refactoring of the runtime to include intermediate results would play for this. This is feasible, but will require runtime code. Simply removing the dams from the iterations can make iterations asynchronous, but introduces two issues: (1) How to checkpoint such computations in order to recover them upon failures. (2) How to check for the termination of the computation without a sync barrier. Can the SSP model help with that? Kostas On Fri, Feb 20, 2015 at 5:27 PM, Nam-Luc Tran <[email protected]> wrote: > Hello Everyone, > > I am Nam-Luc Tran, research Engineer at EURA NOVA [1]. Our research > subjects cover distributed machine learning and we have been working > on dataflow graph processing for a while now. We have been reading > from you since Stratosphere :-) > > Our current research focuses on Stale Synchronous Parallelism and we > are currently considering Apache Flink as a good candidate for > implementing and delivering the best results among the existing > processing solutions. I have written a post about it here: > > https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran > > > What do you guys think about the approach? Does it seem feasible, or > do you have anything similar in your roadmap? > > Best regards, > > Tran Nam-Luc > > > > Links: > ------ > [1] http://euranova.eu > >
