Re: [DISCUSSION]: Adding GridGain component in Bigtop

Dmitriy Setrakyan Wed, 26 Mar 2014 14:39:36 -0700

I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will
have a meetup there in the evening. Do you think it will be OK if I could
spend about 20 minutes there to present GridGain GGFS and overall approach
to Hadoop acceleration? I think it would be interesting to go through a
couple of architectural diagrams and may spur a good discussion.


-Dmitriy

On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <[email protected]> wrote:

> I love the fact that GridGain is going to be part of bigtop !   This will
> give us two new compute paradigms, all packaged  and testable under the
> same umbrella.  And now with our vagrant recipes, people will be able to
> demo grid gain by simply typing "vagrant up" into the console.
>
> And Im pretty sure GridGain and Spark will drive each other forward .  Just
> the same way Ceph, HDFS, and GlusterFS do.
>
> Dmitriy will you be at apachecon?  If so why dont you come share your
> thoughts with us at the two bigtop meetups on the 7th and the 8th ?
>
>
>
>
>
> On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan <
> [email protected]
> > wrote:
>
> > Andrew,
> >
> > I agree with you. All I meant to say is that currently users of Hadoop
> that
> > would like to improve performance of their deployments have to switch to
> > Spark and code to Spark APIs. GridGain, on the other hand, will provide
> an
> > option to accelerate existing Hadoop deployments without any changes in
> > code.
> >
> > Regards,
> > -Dmtiriy
> >
> > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <[email protected]>
> > wrote:
> >
> > > Thank you.
> > >
> > > On this part of your response:
> > >
> > > > GridGain is working on adding native MapReduce component which will
> > > provide
> > > native complete Hadoop integration without changes in API, like Spark
> > > currently forces you to do
> > >
> > > I'm not sure those flocking to Spark are doing so by force. Nor that
> the
> > > Spark API should be considered a liability when compared to Hadoop
> > > MapReduce. For your consideration.
> > >
> > >
> > >
> > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan <
> > > [email protected]
> > > > wrote:
> > >
> > > > I think the feature set is pretty close and GGFS would be a good
> > contract
> > > > to Tachyon for performance and reliability features.
> > > >
> > > > I am not an expert on Tachyon, but I think the main differences are:
> > > >
> > > > - GGFS allows read-through and write-through to/from underlying HDFS
> or
> > > any
> > > > other Hadoop compliant file system with zero code change. Essentially
> > > GGFS
> > > > entirely removes ETL step from integration.
> > > >
> > > > - GGFS has ability to pick and choose what folders stay in memory,
> what
> > > > folders stay on disc, and what folders get synchronized with
> underlying
> > > > (HD)FS either synchronously or asynchronously.
> > > >
> > > > - GridGain is working on adding native MapReduce component which will
> > > > provide native complete Hadoop integration without changes in API,
> like
> > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will
> > allow
> > > > to bring Hadoop completely or partially in-memory in Plug-n-Play
> > fashion
> > > > without any API changes.
> > > >
> > > > There are probably other differences that I am forgetting right now,
> > but
> > > I
> > > > think the above set lists the most significant ones.
> > > >
> > > > Regards,
> > > > --
> > > > Dmitriy Setrakyan, EVP Engineering
> > > > *GridGain Systems*
> > > > www.gridgain.com
> > > >
> > > >
> > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <
> [email protected]
> > > > >wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > Would it be possible to contrast GGFS with Tachyon (
> > > > > http://tachyon-project.org/)?
> > > > >
> > > > > Also, do you have any plans for Spark integration?
> > > > >
> > > > >
> > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan <
> > > > > [email protected]
> > > > > > wrote:
> > > > >
> > > > > > Hi Roman,
> > > > > >
> > > > > > At this point the integration is pluggable in memory file system,
> > > GGFS.
> > > > > It
> > > > > > works just like HDFS (same API), but in reality serves as a
> caching
> > > > layer
> > > > > > on top  of HDFS. GGFS caches the hottest file blocks and then
> > > > > synchronizes
> > > > > > them with underlying HDFS either synchronously or asynchronously,
> > > > > depending
> > > > > > on configuration.
> > > > > >
> > > > > > Since, GGFS implements standard Hadoop File System API, it
> > > > automatically
> > > > > > integrates with other Hadoop ecosystem pieces via File System API
> > as
> > > > > well.
> > > > > >
> > > > > > Going forward, we are planning to add same native API integration
> > for
> > > > > > MapReduce component as well.
> > > > > >
> > > > > > Hope this answers your question.
> > > > > >
> > > > > > -Dmitriy
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Dmitriy!
> > > > > > >
> > > > > > > Welcome to the Bigtop community!
> > > > > > >
> > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >> One of the main pieces of our platform is our In-Memory
> Apache
> > > > > Hadoop
> > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by
> > > > bringing
> > > > > > > both,
> > > > > > > >> data and computations into memory. We do it with our GGFS -
> > > Hadoop
> > > > > > > >> compliant in-memory file system. For I/O intensive jobs
> > GridGain
> > > > > GGFS
> > > > > > > >> offers performance close to 100x faster than standard HDFS.
> > More
> > > > > > > >> information can be found here:
> > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/
> > > > > > > >>
> > > > > > > >> We would like to have an opportunity to integrate our Apache
> > > > Hadoop
> > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this
> is
> > > > > possible
> > > > > > > and
> > > > > > > >> what steps are required of us.
> > > > > > >
> > > > > > > I've been actually fascinated by the in-memory analytics
> > platforms
> > > > > > lately.
> > > > > > > Things like Apache Spark seem to be a really good addition to
> the
> > > > > > > Hadoop ecosystem.
> > > > > > >
> > > > > > > Now, I understand that you've got a piece of technology that
> can
> > > > > > > essentially
> > > > > > > serve as a replacement for HDFS, but could you please elaborate
> > on
> > > > > > > what other integration points do you have between GridGain and
> > the
> > > > rest
> > > > > > > of Hadoop ecosystem?
> > > > > > >
> > > > > > > That, I think, would be a much wider discussion.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Roman.
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: [DISCUSSION]: Adding GridGain component in Bigtop

Reply via email to