Andrew, I agree with you. All I meant to say is that currently users of Hadoop that would like to improve performance of their deployments have to switch to Spark and code to Spark APIs. GridGain, on the other hand, will provide an option to accelerate existing Hadoop deployments without any changes in code.
Regards, -Dmtiriy On Tue, Mar 25, 2014 at 4:16 PM, Andrew Purtell <[email protected]> wrote: > Thank you. > > On this part of your response: > > > GridGain is working on adding native MapReduce component which will > provide > native complete Hadoop integration without changes in API, like Spark > currently forces you to do > > I'm not sure those flocking to Spark are doing so by force. Nor that the > Spark API should be considered a liability when compared to Hadoop > MapReduce. For your consideration. > > > > On Tue, Mar 25, 2014 at 12:08 AM, Dmitriy Setrakyan < > [email protected] > > wrote: > > > I think the feature set is pretty close and GGFS would be a good contract > > to Tachyon for performance and reliability features. > > > > I am not an expert on Tachyon, but I think the main differences are: > > > > - GGFS allows read-through and write-through to/from underlying HDFS or > any > > other Hadoop compliant file system with zero code change. Essentially > GGFS > > entirely removes ETL step from integration. > > > > - GGFS has ability to pick and choose what folders stay in memory, what > > folders stay on disc, and what folders get synchronized with underlying > > (HD)FS either synchronously or asynchronously. > > > > - GridGain is working on adding native MapReduce component which will > > provide native complete Hadoop integration without changes in API, like > > Spark currently forces you to do. Essentially GridGain MR+GGFS will allow > > to bring Hadoop completely or partially in-memory in Plug-n-Play fashion > > without any API changes. > > > > There are probably other differences that I am forgetting right now, but > I > > think the above set lists the most significant ones. > > > > Regards, > > -- > > Dmitriy Setrakyan, EVP Engineering > > *GridGain Systems* > > www.gridgain.com > > > > > > On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <[email protected] > > >wrote: > > > > > Dmitriy, > > > > > > Would it be possible to contrast GGFS with Tachyon ( > > > http://tachyon-project.org/)? > > > > > > Also, do you have any plans for Spark integration? > > > > > > > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan < > > > [email protected] > > > > wrote: > > > > > > > Hi Roman, > > > > > > > > At this point the integration is pluggable in memory file system, > GGFS. > > > It > > > > works just like HDFS (same API), but in reality serves as a caching > > layer > > > > on top of HDFS. GGFS caches the hottest file blocks and then > > > synchronizes > > > > them with underlying HDFS either synchronously or asynchronously, > > > depending > > > > on configuration. > > > > > > > > Since, GGFS implements standard Hadoop File System API, it > > automatically > > > > integrates with other Hadoop ecosystem pieces via File System API as > > > well. > > > > > > > > Going forward, we are planning to add same native API integration for > > > > MapReduce component as well. > > > > > > > > Hope this answers your question. > > > > > > > > -Dmitriy > > > > > > > > > > > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <[email protected]> > > > wrote: > > > > > > > > > Hi Dmitriy! > > > > > > > > > > Welcome to the Bigtop community! > > > > > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik < > [email protected] > > > > > > > > wrote: > > > > > >> One of the main pieces of our platform is our In-Memory Apache > > > Hadoop > > > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by > > bringing > > > > > both, > > > > > >> data and computations into memory. We do it with our GGFS - > Hadoop > > > > > >> compliant in-memory file system. For I/O intensive jobs GridGain > > > GGFS > > > > > >> offers performance close to 100x faster than standard HDFS. More > > > > > >> information can be found here: > > > > > >> http://www.gridgain.org/features/hadoop-acceleration/ > > > > > >> > > > > > >> We would like to have an opportunity to integrate our Apache > > Hadoop > > > > > >> Accelerator with Apache Bigtop. Please let us know if this is > > > possible > > > > > and > > > > > >> what steps are required of us. > > > > > > > > > > I've been actually fascinated by the in-memory analytics platforms > > > > lately. > > > > > Things like Apache Spark seem to be a really good addition to the > > > > > Hadoop ecosystem. > > > > > > > > > > Now, I understand that you've got a piece of technology that can > > > > > essentially > > > > > serve as a replacement for HDFS, but could you please elaborate on > > > > > what other integration points do you have between GridGain and the > > rest > > > > > of Hadoop ecosystem? > > > > > > > > > > That, I think, would be a much wider discussion. > > > > > > > > > > Thanks, > > > > > Roman. > > > > > > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > - Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
