Re: Shared HDFS for HBase and MapReduce

Doug Meil Wed, 06 Jun 2012 14:14:53 -0700

Regarding locality, it's not just Lars' stuff, it's in the RefGuide (see
section 9.7.3)Š


http://hbase.apache.org/book.html#regions.arch

re:  "You will still be reading/writing over the network"

This is definitely true as far as writes go because of the replicas (see
the RefGuide for why), although I disagree on the read portion unless
there is an exceptional case (which typically the result of an RS going
down)





On 6/6/12 4:27 PM, "Atif Khan" <[email protected]> wrote:

>Thanks Amandeep!
>
>I think what I was saying that we are trying to support both types of
>workloads.  That is realtime transactional workloads, and batch processing
>for data analysis.  The big question being if a single HDFS cluster should
>be shared between the two workflows.
>
>The point that you are trying to make (if I am understanding you
>correctly)
>is of data "Locality".
>
>/Amandeep Khurana - "Having a common HDFS cluster and using part of the
>nodes as HBase RS and part as the Hadoop TTs doesn't solve the problem of
>moving data from the HBase RS to the tasks you'll run as a part of your MR
>jobs if HBase is your source/sink. You will still be reading/writing over
>the network."
>/
>
>When running MR jobs over HBase, data locality is provided by HBase
>(please
>see http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html,
>and
>also HBase the Definitive Guide by Lars George page 298 MapReduce
>Locality). 
>In other words, the computation will be exported to where the data is,
>therefore limiting the need to transfer data over the network.  Proper
>data
>locality has a big impact on the overall performance.
>
>So I believe that a common HDFS cluster does not imply logical segregation
>between HBase RS and Hadoop TTs.  Therefore, your point seems in
>contradiction with Lars George's statement.
>
>Thoughts?
>
>
>--
>View this message in context:
>http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapRedu
>ce-tp4018856p4018884.html
>Sent from the HBase - Developer mailing list archive at Nabble.com.
>

Re: Shared HDFS for HBase and MapReduce

Reply via email to