Re: [DISCUSSION]: Adding GridGain component in Bigtop

Dmitriy Setrakyan Tue, 25 Mar 2014 00:10:26 -0700

I think the feature set is pretty close and GGFS would be a good contract
to Tachyon for performance and reliability features.


I am not an expert on Tachyon, but I think the main differences are:

- GGFS allows read-through and write-through to/from underlying HDFS or any
other Hadoop compliant file system with zero code change. Essentially GGFS
entirely removes ETL step from integration.

- GGFS has ability to pick and choose what folders stay in memory, what
folders stay on disc, and what folders get synchronized with underlying
(HD)FS either synchronously or asynchronously.

- GridGain is working on adding native MapReduce component which will
provide native complete Hadoop integration without changes in API, like
Spark currently forces you to do. Essentially GridGain MR+GGFS will allow
to bring Hadoop completely or partially in-memory in Plug-n-Play fashion
without any API changes.

There are probably other differences that I am forgetting right now, but I
think the above set lists the most significant ones.

Regards,
--
Dmitriy Setrakyan, EVP Engineering
*GridGain Systems*
www.gridgain.com


On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <[email protected]>wrote:

> Dmitriy,
>
> Would it be possible to contrast GGFS with Tachyon (
> http://tachyon-project.org/)?
>
> Also, do you have any plans for Spark integration?
>
>
> On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> [email protected]
> > wrote:
>
> > Hi Roman,
> >
> > At this point the integration is pluggable in memory file system, GGFS.
> It
> > works just like HDFS (same API), but in reality serves as a caching layer
> > on top  of HDFS. GGFS caches the hottest file blocks and then
> synchronizes
> > them with underlying HDFS either synchronously or asynchronously,
> depending
> > on configuration.
> >
> > Since, GGFS implements standard Hadoop File System API, it automatically
> > integrates with other Hadoop ecosystem pieces via File System API as
> well.
> >
> > Going forward, we are planning to add same native API integration for
> > MapReduce component as well.
> >
> > Hope this answers your question.
> >
> > -Dmitriy
> >
> >
> >
> > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <[email protected]>
> wrote:
> >
> > > Hi Dmitriy!
> > >
> > > Welcome to the Bigtop community!
> > >
> > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <[email protected]>
> > > wrote:
> > > >> One of the main pieces of our platform is our In-Memory Apache
> Hadoop
> > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing
> > > both,
> > > >> data and computations into memory. We do it with our GGFS - Hadoop
> > > >> compliant in-memory file system. For I/O intensive jobs GridGain
> GGFS
> > > >> offers performance close to 100x faster than standard HDFS. More
> > > >> information can be found here:
> > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > >>
> > > >> We would like to have an opportunity to integrate our Apache Hadoop
> > > >> Accelerator with Apache Bigtop. Please let us know if this is
> possible
> > > and
> > > >> what steps are required of us.
> > >
> > > I've been actually fascinated by the in-memory analytics platforms
> > lately.
> > > Things like Apache Spark seem to be a really good addition to the
> > > Hadoop ecosystem.
> > >
> > > Now, I understand that you've got a piece of technology that can
> > > essentially
> > > serve as a replacement for HDFS, but could you please elaborate on
> > > what other integration points do you have between GridGain and the rest
> > > of Hadoop ecosystem?
> > >
> > > That, I think, would be a much wider discussion.
> > >
> > > Thanks,
> > > Roman.
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Reply via email to