On 11/05/2011 08:43, Andrew Purtell wrote:
From: Steve Loughran<[email protected]>
-I don't think you can claim to have a
Distribution/Fork/Version of Apache Hadoop if you swap out
big chunks of it for alternate filesystems, MR engines, etc.
Some description of this is needed
"Supports the Apache Hadoop MapReduce engine on top of
Filesystem XYZ"
This is also the case with Brisk, which replaces HDFS and the standard
JobTracker with Cassandra and a new JobTracker, and claims to be a Hadoop
distribution.
"Apache Hadoop TM Powered by Cassandra"
http://www.datastax.com/products/brisk
"DataStax’ Brisk is an enhanced open-source Apache Hadoop and
Hive distribution that utilizes Apache Cassandra for many of
its core services. [...]"
+1. It is something containing Hadoop interfaces and possibly
source/artifacts, but I'm not sure how to describe it. It is just
something that claims compatibility with Hadoop's filesystem and MR
runtime. If Google chose to add the same interfaces to their platform
within Google App Engine, it wouldn't be a Hadoop distro either.
I think it's important to set some definitions here *now* so that
confusion doesn't set in.