Just to be clear, Spark actually *does* support general task graphs, similar to 
Dryad (though a bit simpler in that there's a notion of "stages" and a fixed 
set of connection patterns between them). However, MBrace goes a step beyond 
that, in that the graphs can be modified dynamically based on user code. It's 
also not clear what the granularity of task spawns in MBrace is -- can you 
spawn stuff that runs for 1 millisecond, or 1 second, or 1 hour? The choice 
there greatly affects system design.

Matei

On Oct 23, 2013, at 6:54 PM, Christopher Nguyen <c...@adatao.com> wrote:

> Re MBrace: very interesting work. I'm a bit surprised though that the paper
> makes no mention of DryadLINQ (
> http://research.microsoft.com/en-us/projects/dryadlinq/dryadlinq.pdf).
> 
> Architecturally it's a lot easier to see an MBrace implementation
> specialized to a MapReduce (or more generically, a BSP) computation, than
> to have a Spark implement the fully async DAG model of an MBrace/Dryad
> engine.
> 
> More practically, as interesting as it might be as a side effort, I think
> for the core Spark effort to attempt something like that would be "off
> mission". Spark's success to date has been more due to beautiful
> implementation of a known architecture, than beautiful new architecture.
> Basically, Spark does MapReduce 10-100x faster than Hadoop, and more people
> by now understand how to get MapReduce to solve their problems than any
> other parallel model. Spark sits natively on HDFS so that makes adoption a
> lot easier to swallow. So at present, for Spark to mature quickly along
> that successful trajectory, the key problems to address are more practical
> "user interface" or "productivity" things like manageability,
> deployability, fault-tolerance improvements, multi-user access, a bigger
> library of pre-packaged algorithms, etc.
> 
> Whether MapReduce's own success is an accident of history or something more
> fundamental is subject to interesting debate. I remember being constantly
> amazed by the number of problems that when squinted at the right way
> becomes an MR-soluble problem at Google (starting ironically with PageRank
> itself). Yes, apparently sometimes it does pay to see many things as a nail
> when you have invested in a powerful hammer.
> 
> Along those lines, here are some interesting perspectives on the beauty of
> Dryad/DryadLINQ, and at least one practical reason why it didn't succeed as
> an implementation.
> 
>   -
>   
> http://blogs.msdn.com/b/dryad/archive/2010/02/15/some-dryad-and-dryadlinq-history.aspx
>   -
>   
> http://geekswithblogs.net/johnsPerfBlog/archive/2011/12/12/rip-dryadlinq-or-long-live-linq-to-hadoop.aspx
> 
> 
> 
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao <http://adatao.com>
> linkedin.com/in/ctnguyen
> 
> 
> 
> On Wed, Oct 23, 2013 at 2:33 PM, Alex Boisvert <alex.boisv...@gmail.com>wrote:
> 
>> (Resending to @apache list instead of old google-group)
>> 
>> A bit of a random question but I was wondering if there were efforts
>> underway to generalize / expand the Spark API towards something that would
>> be similar to the MBrace [1] model ... there's certainly an overlap between
>> the features of the systems already ... so I guess I'm thinking about an
>> API that's less centered around RDDs (as a collection) and more towards
>> distributed dataflow that would feel more like composing Promises/Futures
>> ... or even generalizing to support various sorts of container/context
>> monads.
>> 
>> [1] "MBrace: Cloud Computing with Monads"
>> http://plosworkshop.org/2013/preprint/dzik.pdf
>> 

Reply via email to