I plan to be at ApacheCon on Monday, April 7th. I hear that Bigtop will have a meetup there in the evening. Do you think it will be OK if I could spend about 20 minutes there to present GridGain GGFS and overall approach to Hadoop acceleration? I think it would be interesting to go through a couple of architectural diagrams and may spur a good discussion.
-Dmitriy On Wed, Mar 26, 2014 at 8:35 AM, Jay Vyas <[email protected]> wrote: > I love the fact that GridGain is going to be part of bigtop ! This will > give us two new compute paradigms, all packaged and testable under the > same umbrella. And now with our vagrant recipes, people will be able to > demo grid gain by simply typing "vagrant up" into the console. > > And Im pretty sure GridGain and Spark will drive each other forward . Just > the same way Ceph, HDFS, and GlusterFS do. > > Dmitriy will you be at apachecon? If so why dont you come share your > thoughts with us at the two bigtop meetups on the 7th and the 8th ? > > > > > > On Wed, Mar 26, 2014 at 10:26 AM, Dmitriy Setrakyan < > [email protected] > > wrote: > > > Andrew, > > > > I agree with you. All I meant to say is that currently users of Hadoop > that > > would like to improve performance of their deployments have to switch to > > Spark and code to Spark APIs. GridGain, on the other hand, will provide > an > > option to accelerate existing Hadoop deployments without any changes in > > code. > > > > Regards, > > -Dmtiriy > > > > On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <[email protected]> > > wrote: > > > > > Thank you. > > > > > > On this part of your response: > > > > > > > GridGain is working on adding native MapReduce component which will > > > provide > > > native complete Hadoop integration without changes in API, like Spark > > > currently forces you to do > > > > > > I'm not sure those flocking to Spark are doing so by force. Nor that > the > > > Spark API should be considered a liability when compared to Hadoop > > > MapReduce. For your consideration. > > > > > > > > > > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan < > > > [email protected] > > > > wrote: > > > > > > > I think the feature set is pretty close and GGFS would be a good > > contract > > > > to Tachyon for performance and reliability features. > > > > > > > > I am not an expert on Tachyon, but I think the main differences are: > > > > > > > > - GGFS allows read-through and write-through to/from underlying HDFS > or > > > any > > > > other Hadoop compliant file system with zero code change. Essentially > > > GGFS > > > > entirely removes ETL step from integration. > > > > > > > > - GGFS has ability to pick and choose what folders stay in memory, > what > > > > folders stay on disc, and what folders get synchronized with > underlying > > > > (HD)FS either synchronously or asynchronously. > > > > > > > > - GridGain is working on adding native MapReduce component which will > > > > provide native complete Hadoop integration without changes in API, > like > > > > Spark currently forces you to do. Essentially GridGain MR+GGFS will > > allow > > > > to bring Hadoop completely or partially in-memory in Plug-n-Play > > fashion > > > > without any API changes. > > > > > > > > There are probably other differences that I am forgetting right now, > > but > > > I > > > > think the above set lists the most significant ones. > > > > > > > > Regards, > > > > -- > > > > Dmitriy Setrakyan, EVP Engineering > > > > *GridGain Systems* > > > > www.gridgain.com > > > > > > > > > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell < > [email protected] > > > > >wrote: > > > > > > > > > Dmitriy, > > > > > > > > > > Would it be possible to contrast GGFS with Tachyon ( > > > > > http://tachyon-project.org/)? > > > > > > > > > > Also, do you have any plans for Spark integration? > > > > > > > > > > > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan < > > > > > [email protected] > > > > > > wrote: > > > > > > > > > > > Hi Roman, > > > > > > > > > > > > At this point the integration is pluggable in memory file system, > > > GGFS. > > > > > It > > > > > > works just like HDFS (same API), but in reality serves as a > caching > > > > layer > > > > > > on top of HDFS. GGFS caches the hottest file blocks and then > > > > > synchronizes > > > > > > them with underlying HDFS either synchronously or asynchronously, > > > > > depending > > > > > > on configuration. > > > > > > > > > > > > Since, GGFS implements standard Hadoop File System API, it > > > > automatically > > > > > > integrates with other Hadoop ecosystem pieces via File System API > > as > > > > > well. > > > > > > > > > > > > Going forward, we are planning to add same native API integration > > for > > > > > > MapReduce component as well. > > > > > > > > > > > > Hope this answers your question. > > > > > > > > > > > > -Dmitriy > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Hi Dmitriy! > > > > > > > > > > > > > > Welcome to the Bigtop community! > > > > > > > > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > >> One of the main pieces of our platform is our In-Memory > Apache > > > > > Hadoop > > > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by > > > > bringing > > > > > > > both, > > > > > > > >> data and computations into memory. We do it with our GGFS - > > > Hadoop > > > > > > > >> compliant in-memory file system. For I/O intensive jobs > > GridGain > > > > > GGFS > > > > > > > >> offers performance close to 100x faster than standard HDFS. > > More > > > > > > > >> information can be found here: > > > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/ > > > > > > > >> > > > > > > > >> We would like to have an opportunity to integrate our Apache > > > > Hadoop > > > > > > > >> Accelerator with Apache Bigtop. Please let us know if this > is > > > > > possible > > > > > > > and > > > > > > > >> what steps are required of us. > > > > > > > > > > > > > > I've been actually fascinated by the in-memory analytics > > platforms > > > > > > lately. > > > > > > > Things like Apache Spark seem to be a really good addition to > the > > > > > > > Hadoop ecosystem. > > > > > > > > > > > > > > Now, I understand that you've got a piece of technology that > can > > > > > > > essentially > > > > > > > serve as a replacement for HDFS, but could you please elaborate > > on > > > > > > > what other integration points do you have between GridGain and > > the > > > > rest > > > > > > > of Hadoop ecosystem? > > > > > > > > > > > > > > That, I think, would be a much wider discussion. > > > > > > > > > > > > > > Thanks, > > > > > > > Roman. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best regards, > > > > > > > > > > - Andy > > > > > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > > Hein > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > - Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > > > > > -- > Jay Vyas > http://jayunit100.blogspot.com >
