Re: Shared HDFS for HBase and MapReduce

Atif Khan Wed, 06 Jun 2012 13:56:25 -0700

Thanks Amandeep!

I think what I was saying that we are trying to support both types of
workloads.  That is realtime transactional workloads, and batch processing
for data analysis.  The big question being if a single HDFS cluster should
be shared between the two workflows.


The point that you are trying to make (if I am understanding you correctly)
is of data "Locality". 

/Amandeep Khurana - "Having a common HDFS cluster and using part of the
nodes as HBase RS and part as the Hadoop TTs doesn't solve the problem of
moving data from the HBase RS to the tasks you'll run as a part of your MR
jobs if HBase is your source/sink. You will still be reading/writing over
the network."
/

When running MR jobs over HBase, data locality is provided by HBase (please
see http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html, and
also HBase the Definitive Guide by Lars George page 298 MapReduce Locality). 
In other words, the computation will be exported to where the data is,
therefore limiting the need to transfer data over the network.  Proper data
locality has a big impact on the overall performance.  

So I believe that a common HDFS cluster does not imply logical segregation
between HBase RS and Hadoop TTs.  Therefore, your point seems in
contradiction with Lars George's statement.

Thoughts?


--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-tp4018856p4018884.html
Sent from the HBase - Developer mailing list archive at Nabble.com.

Re: Shared HDFS for HBase and MapReduce

Reply via email to