What is the "best practice" for HBase, MapReduce and HDFS deployment?  We are
interested in storing our data in HBase, and then run analytics on it using
MapReduce.  MapReduce will utilize data from HBase tables and HDFS files.

My first thoughts were to create a single HDFS cluster, and then point the
MapReduce and HBase servers to use the common HDFS installation.  However,
Cloudera's Dos and Don'ts page
(http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) insists that
MapReduce and HBase should not share an HDFS cluster.  Rather they should
have their own individual clusters.  I don't understand this recommendation,
as it would result in moving data around from one HDFS cluster to another
when running MapReduce over HBase.

Any help/ideas would be appreciated.

Thanks!

--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856.html
Sent from the HBase - Developer mailing list archive at Nabble.com.

Reply via email to