Re: Spark in-memory analytics in BigTop stack

MTG dev Tue, 25 Sep 2012 10:46:44 -0700

Hi there.

Apparently, I am not in a position to say what role Spark can play in the
Bigtop for I am not speaking for neither of those projects.

However, I can tell that Spark provides a number of the advantages compare to
a traditional MapReduce model: stateful computational model with a need to
write everything back to file system after step, in-memory calculations,
higher level of primitives expressed in a functional language, etc. These
advantages combined with low-latency planner result in a very significant
performance improvement. I'd suggest to go over spark-project.org for more
information.

I am not an expert on Drill, but I'd say that Spark give immediate benefits
over the former because it is already here and can be used by anyone ;)

As for integration with Bigtop: Spark doesn't require any special integration
with the rest of the stack - it might use HDFS as the underlying storage, but
that's about it.

Looks like there's an ongoing development to allow Spark to use Hive's SerDes,
but I am not completely sure about its status.

On Mon, Sep 24, 2012 at 09:59PM, Roman Shaposhnik wrote:
> On Mon, Sep 24, 2012 at 8:52 PM, Anatoli Fomenko <afome...@yahoo.com> wrote:
> > Hi Alef,
> >
> > Great news!
> >
> > Spark developers are interested in developing Spark packages and
> > contributing them to open source. Since you already have them,
> > what would you think about contributing the source to BigTop?

We don't have any plans of holding the sources of the packages back, but we
are working on rpm packaging right now. Once the work is over, we should be
able to contribute it back to the community. Shall there be a JIRA ticket for
that or something?

With regards,
  Alef
  MTG dev team

> This is very, very interesting indeed! I'd also like to hear a bit
> more about what role Spark can play in Bigtop project -- from
> just skimming the web it feels like it can be seen as an
> alternative to Apache Drill (incubating) or am I completely off
> base here?
> 
> Also, what level of integration is required between Spark and
> the rest of Hadoop ecosystem components (Hive, Pig, etc.)?
> 
> Thanks,
> Roman.

Re: Spark in-memory analytics in BigTop stack

Reply via email to