I think the feature set is pretty close and GGFS would be a good contract to Tachyon for performance and reliability features.
I am not an expert on Tachyon, but I think the main differences are: - GGFS allows read-through and write-through to/from underlying HDFS or any other Hadoop compliant file system with zero code change. Essentially GGFS entirely removes ETL step from integration. - GGFS has ability to pick and choose what folders stay in memory, what folders stay on disc, and what folders get synchronized with underlying (HD)FS either synchronously or asynchronously. - GridGain is working on adding native MapReduce component which will provide native complete Hadoop integration without changes in API, like Spark currently forces you to do. Essentially GridGain MR+GGFS will allow to bring Hadoop completely or partially in-memory in Plug-n-Play fashion without any API changes. There are probably other differences that I am forgetting right now, but I think the above set lists the most significant ones. Regards, -- Dmitriy Setrakyan, EVP Engineering *GridGain Systems* www.gridgain.com On Mon, Mar 24, 2014 at 11:53 PM, Andrew Purtell <[email protected]>wrote: > Dmitriy, > > Would it be possible to contrast GGFS with Tachyon ( > http://tachyon-project.org/)? > > Also, do you have any plans for Spark integration? > > > On Mon, Mar 24, 2014 at 11:35 PM, Dmitriy Setrakyan < > [email protected] > > wrote: > > > Hi Roman, > > > > At this point the integration is pluggable in memory file system, GGFS. > It > > works just like HDFS (same API), but in reality serves as a caching layer > > on top of HDFS. GGFS caches the hottest file blocks and then > synchronizes > > them with underlying HDFS either synchronously or asynchronously, > depending > > on configuration. > > > > Since, GGFS implements standard Hadoop File System API, it automatically > > integrates with other Hadoop ecosystem pieces via File System API as > well. > > > > Going forward, we are planning to add same native API integration for > > MapReduce component as well. > > > > Hope this answers your question. > > > > -Dmitriy > > > > > > > > On Mon, Mar 24, 2014 at 11:11 PM, Roman Shaposhnik <[email protected]> > wrote: > > > > > Hi Dmitriy! > > > > > > Welcome to the Bigtop community! > > > > > > On Mon, Mar 24, 2014 at 10:43 PM, Konstantin Boudnik <[email protected]> > > > wrote: > > > >> One of the main pieces of our platform is our In-Memory Apache > Hadoop > > > >> Accelerator which aims to accelerate HDFS and Map/Reduce by bringing > > > both, > > > >> data and computations into memory. We do it with our GGFS - Hadoop > > > >> compliant in-memory file system. For I/O intensive jobs GridGain > GGFS > > > >> offers performance close to 100x faster than standard HDFS. More > > > >> information can be found here: > > > >> http://www.gridgain.org/features/hadoop-acceleration/ > > > >> > > > >> We would like to have an opportunity to integrate our Apache Hadoop > > > >> Accelerator with Apache Bigtop. Please let us know if this is > possible > > > and > > > >> what steps are required of us. > > > > > > I've been actually fascinated by the in-memory analytics platforms > > lately. > > > Things like Apache Spark seem to be a really good addition to the > > > Hadoop ecosystem. > > > > > > Now, I understand that you've got a piece of technology that can > > > essentially > > > serve as a replacement for HDFS, but could you please elaborate on > > > what other integration points do you have between GridGain and the rest > > > of Hadoop ecosystem? > > > > > > That, I think, would be a much wider discussion. > > > > > > Thanks, > > > Roman. > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
