Re: Opinions stratosphere

2014-05-02 Thread Philip Ogren
Great reference!  I just skimmed through the results without reading 
much of the methodology - but it looks like Spark outperforms 
Stratosphere fairly consistently in the experiments.  It's too bad the 
data sources only range from 2GB to 8GB.  Who knows if the apparent 
pattern would extend out to 64GB, 128GB, 1TB, and so on...




On 05/01/2014 06:02 PM, Christopher Nguyen wrote:
Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually 
attempted such a comparative study as a Masters thesis:


http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf

According to this snapshot (c. 2013), Stratosphere is different from 
Spark in not having an explicit concept of an in-memory dataset (e.g., 
RDD).


In principle this could be argued to be an implementation detail; the 
operators and execution plan/data flow are of primary concern in the 
API, and the data representation/materializations are otherwise 
unspecified.


But in practice, for long-running interactive applications, I consider 
RDDs to be of fundamental, first-class citizen importance, and the key 
distinguishing feature of Spark's model vs other in-memory 
approaches that treat memory merely as an implicit cache.


--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen



On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia 
matei.zaha...@gmail.com mailto:matei.zaha...@gmail.com wrote:


I don’t know a lot about it except from the research side, where
the team has done interesting optimization stuff for these types
of applications. In terms of the engine, one thing I’m not sure of
is whether Stratosphere allows explicit caching of datasets
(similar to RDD.cache()) and interactive queries (similar to
spark-shell). But it’s definitely an interesting project to watch.

Matei

On Nov 22, 2013, at 4:17 PM, Ankur Chauhan
achau...@brightcove.com mailto:achau...@brightcove.com wrote:

 Hi,

 That's what I thought but as per the slides on
http://www.stratosphere.eu they seem to know about spark and the
scala api does look similar.
 I found the PACT model interesting. Would like to know if matei
or other core comitters have something to weight in on.

 -- Ankur
 On 22 Nov 2013, at 16:05, Patrick Wendell pwend...@gmail.com
mailto:pwend...@gmail.com wrote:

 I've never seen that project before, would be interesting to get a
 comparison. Seems to offer a much lower level API. For instance
this
 is a wordcount program:



https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java

 On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan
achau...@brightcove.com mailto:achau...@brightcove.com wrote:
 Hi,

 I was just curious about
https://github.com/stratosphere/stratosphere
 and how does spark compare to it. Anyone has any experience
with it to make
 any comments?

 -- Ankur







Re: Opinions stratosphere

2014-05-02 Thread Michael Malak
looks like Spark outperforms Stratosphere fairly consistently in the 
experiments

There was one exception the paper noted, which was when memory resources were 
constrained. In that case, Stratosphere seemed to have degraded more gracefully 
than Spark, but the author did not explore it deeper. The author did insert 
into his conclusion section, though, However, in our experiments, for 
iterative algorithms, the Spark programs may show the poor results in 
performance in the environment of limited memory resources.

I recently blogged a fuller list of alternatives/competitors to Spark:
http://datascienceassn.org/content/alternatives-spark-memory-distributed-computing

 
On Friday, May 2, 2014 10:39 AM, Philip Ogren philip.og...@oracle.com wrote:
 
Great reference!  I just skimmed through the results without reading much of 
the methodology - but it looks like Spark outperforms Stratosphere fairly 
consistently in the experiments.  It's too bad the data sources only range from 
2GB to 8GB.  Who knows if the apparent pattern would extend out to 64GB, 128GB, 
1TB, and so on...




On 05/01/2014 06:02 PM, Christopher Nguyen wrote:

Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually attempted such a 
comparative study as a Masters thesis: 


http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf



According to this snapshot (c. 2013), Stratosphere is different from Spark in 
not having an explicit concept of an in-memory dataset (e.g., RDD).


In principle this could be argued to be an implementation detail; the 
operators and execution plan/data flow are of primary concern in the API, and 
the data representation/materializations are otherwise unspecified.


But in practice, for long-running interactive applications, I consider RDDs to 
be of fundamental, first-class citizen importance, and the key distinguishing 
feature of Spark's model vs other in-memory approaches that treat memory 
merely as an implicit cache.


--

Christopher T. Nguyen
Co-founder  CEO, Adatao
linkedin.com/in/ctnguyen




On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.com wrote:

I don’t know a lot about it except from the research side, where the team has 
done interesting optimization stuff for these types of applications. In terms 
of the engine, one thing I’m not sure of is whether Stratosphere allows 
explicit caching of datasets (similar to RDD.cache()) and interactive queries 
(similar to spark-shell). But it’s definitely an interesting project to watch.

Matei
 

On Nov 22, 2013, at 4:17 PM, Ankur Chauhan achau...@brightcove.com wrote:

 Hi,

 That's what I thought but as per the slides on http://www.stratosphere.eu 
 they seem to know about spark and the scala api does look similar.
 I found the PACT model interesting. Would like to
  know if matei or other core comitters have something
  to weight in on.

 -- Ankur
 On 22 Nov 2013, at 16:05, Patrick Wendell pwend...@gmail.com wrote:

 I've never seen that project before, would be
  interesting to get a
 comparison. Seems to offer a much lower level
  API. For instance this
 is a wordcount program:

 https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java

 On Thu, Nov 21, 2013 at 3:15 PM, Ankur
  Chauhan achau...@brightcove.com wrote:
 Hi,

 I was just curious about https://github.com/stratosphere/stratosphere
 and how does spark compare to it. Anyone
  has any experience with it to make
 any comments?

 -- Ankur





Re: Opinions stratosphere

2014-05-01 Thread Christopher Nguyen
Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually attempted
such a comparative study as a Masters thesis:

http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf

According to this snapshot (c. 2013), Stratosphere is different from Spark
in not having an explicit concept of an in-memory dataset (e.g., RDD).

In principle this could be argued to be an implementation detail; the
operators and execution plan/data flow are of primary concern in the API,
and the data representation/materializations are otherwise unspecified.

But in practice, for long-running interactive applications, I consider RDDs
to be of fundamental, first-class citizen importance, and the key
distinguishing feature of Spark's model vs other in-memory approaches
that treat memory merely as an implicit cache.

--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen



On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.comwrote:

 I don’t know a lot about it except from the research side, where the team
 has done interesting optimization stuff for these types of applications. In
 terms of the engine, one thing I’m not sure of is whether Stratosphere
 allows explicit caching of datasets (similar to RDD.cache()) and
 interactive queries (similar to spark-shell). But it’s definitely an
 interesting project to watch.

 Matei

 On Nov 22, 2013, at 4:17 PM, Ankur Chauhan achau...@brightcove.com
 wrote:

  Hi,
 
  That's what I thought but as per the slides on
 http://www.stratosphere.eu they seem to know about spark and the scala
 api does look similar.
  I found the PACT model interesting. Would like to know if matei or other
 core comitters have something to weight in on.
 
  -- Ankur
  On 22 Nov 2013, at 16:05, Patrick Wendell pwend...@gmail.com wrote:
 
  I've never seen that project before, would be interesting to get a
  comparison. Seems to offer a much lower level API. For instance this
  is a wordcount program:
 
 
 https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java
 
  On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan achau...@brightcove.com
 wrote:
  Hi,
 
  I was just curious about https://github.com/stratosphere/stratosphere
  and how does spark compare to it. Anyone has any experience with it to
 make
  any comments?
 
  -- Ankur