What is the "best practice" for HBase, MapReduce and HDFS deployment? We are interested in storing our data in HBase, and then run analytics on it using MapReduce. MapReduce will utilize data from HBase tables and HDFS files.
My first thoughts were to create a single HDFS cluster, and then point the MapReduce and HBase servers to use the common HDFS installation. However, Cloudera's Dos and Don'ts page (http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/) insists that MapReduce and HBase should not share an HDFS cluster. Rather they should have their own individual clusters. I don't understand this recommendation, as it would result in moving data around from one HDFS cluster to another when running MapReduce over HBase. Any help/ideas would be appreciated. Thanks! -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856.html Sent from the HBase - Developer mailing list archive at Nabble.com.
