Yup, that's a great summary.  More details...

The HCFS wiki page will give you insight into some tests you can run to test 
your FileSystem plugin class, which you will put in a jar file described below. 

In general, hadoop apps are written to the file system interface which is 
loaded runtime, so as long as you configure the core-site file correctly, and 
have corresponding file paths qualified to reference the uri which you defined 
to map to correct Java classes (which are in a jar somewhere in hadoop/lib, for 
example), everything should work.  

We've tested solr, hbase, mahout and many other systems that use the FileSystem 
interface in various different ways, in general, it works pretty well... With 
the exception of Impala which e is HDFS specific (it checks at runtime that 
your running hdfs, and if not it throws an error).

A good suite of tests to run for hcfs compatibility is the BigTop smoke tests, 
which exersiZe pig, flume, mapreduce, mahout and we use those to validate 
glusterfs.



> On Dec 10, 2014, at 3:50 PM, Roman Shaposhnik <ro...@shaposhnik.org> wrote:
> 
>> On Wed, Dec 10, 2014 at 12:20 PM, Ari King <ari.brandeis.k...@gmail.com> 
>> wrote:
>> Hi,
>> 
>> I'm doing a research paper on Hadoop -- specifically relating to its
>> dependency on HDFS. I need to determine if and how HDFS can be replaced. As
>> I understand it, there are a number of organizations that have produced
>> HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive,
>> HBase, etc.
> 
> There's a difference between producing a storage solution with
> on-the-wire-protocol compatible with HDFS vs. an HCFS one (see
> bellow).
> 
>> With the "if" part being answered, I'd appreciate insight/guidance on the
>> "how" part. Essentially, where can I find information on what MapReduce and
>> the other Hadoop subprojects require of the underlying file system and how
>> these subprojects expect to interact with the file system.
> 
> It really boils down for a storage solution to expose a Hadoop Compatible
> Filesystem API. This should give you a sufficient overview of the details:
>    https://wiki.apache.org/hadoop/HCFS
> 
> A lot of open source (Ceph, GlusterFS, etc.) and closed source storage 
> solutions
> (Isilon, etc.) do that and can be used as a replacement for HDFS.
> 
> The real question, of course, is all the different tradeoffs that the
> implementations
> are making. That's where it gets fascinating.
> 
> Thanks,
> Roman.

Reply via email to