Yup, that's a great summary. More details... The HCFS wiki page will give you insight into some tests you can run to test your FileSystem plugin class, which you will put in a jar file described below.
In general, hadoop apps are written to the file system interface which is loaded runtime, so as long as you configure the core-site file correctly, and have corresponding file paths qualified to reference the uri which you defined to map to correct Java classes (which are in a jar somewhere in hadoop/lib, for example), everything should work. We've tested solr, hbase, mahout and many other systems that use the FileSystem interface in various different ways, in general, it works pretty well... With the exception of Impala which e is HDFS specific (it checks at runtime that your running hdfs, and if not it throws an error). A good suite of tests to run for hcfs compatibility is the BigTop smoke tests, which exersiZe pig, flume, mapreduce, mahout and we use those to validate glusterfs. > On Dec 10, 2014, at 3:50 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote: > >> On Wed, Dec 10, 2014 at 12:20 PM, Ari King <ari.brandeis.k...@gmail.com> >> wrote: >> Hi, >> >> I'm doing a research paper on Hadoop -- specifically relating to its >> dependency on HDFS. I need to determine if and how HDFS can be replaced. As >> I understand it, there are a number of organizations that have produced >> HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive, >> HBase, etc. > > There's a difference between producing a storage solution with > on-the-wire-protocol compatible with HDFS vs. an HCFS one (see > bellow). > >> With the "if" part being answered, I'd appreciate insight/guidance on the >> "how" part. Essentially, where can I find information on what MapReduce and >> the other Hadoop subprojects require of the underlying file system and how >> these subprojects expect to interact with the file system. > > It really boils down for a storage solution to expose a Hadoop Compatible > Filesystem API. This should give you a sufficient overview of the details: > https://wiki.apache.org/hadoop/HCFS > > A lot of open source (Ceph, GlusterFS, etc.) and closed source storage > solutions > (Isilon, etc.) do that and can be used as a replacement for HDFS. > > The real question, of course, is all the different tradeoffs that the > implementations > are making. That's where it gets fascinating. > > Thanks, > Roman.