Thanks to all who replied, especially Vladimir and Mathias!!! So if I understand this correctly, there is physical resource contention problem given that both MR and HBase are resource hungry. Therefore, when end-user SLAs are in place, performance guarantees may be compromised when HBase and MR share the same HDFS cluster (and other resources).
According to Mathias's suggestion, on production HDFS cluster, we could throttle/limit the MR activity so that it has minimal impact on HBase's (realtime) performance. So far so good. Now my BIG question is about the BIG Data itself (no pun intended). If I do create two HDFS clusters (one for MR and one for HBase), and then given that HBase acting as data source and sink; Would I not be forced to move LARGE amounts of data between the two HDFS clusters? Given the size of the data, this could potentially congest the internal network on which the two independent HDFS clusters are deployed. Thoughts? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856p4018878.html Sent from the HBase - Developer mailing list archive at Nabble.com.
