Few questions about map reduce in Hbase

Nishant Khurana Sun, 16 Nov 2008 11:09:30 -0800

Hi,
I am new to Hadoop and Hbase. I am trying to understand how to use map
reduce with Hbase as source and sink and had following questions. Would
appreciate if someone can answer them and may be point me to some sample
code:


-- As far as I understood, the tables gets stored in different regions in
Hbase which are split across various nodes in HDFS. Is there a way to
control the amount of replication of a particular table ?

--When we try to use a table scanner, it automatically switches between
various regions of a table which may be present across different nodes and
returns us the row handle. So it is a single process doing that. Am I
correct ?

-- When we use TableMap to run map reduce jobs on Hbase, it automatically
creates several map jobs i.e. one per region and performs map operation on
the key range of that particular region. So if I use a table scanner inside
a map job, will I be still iterating through only row ranges of that
particular region or again the whole table ?

-- What is the best way if I may want to iterate through all the rows for a
particualr region in a map job. This may be required to perform a select
operation parallely.

Sorry for the long email. Many of the questions may be basic. I appreciate
if someone can answer them. Also any suggestions of implementing joins using
map reduce on hbase.
Thanks

-- 
Nishant Khurana
Candidate for Masters in Engineering (Dec 2009)
Computer and Information Science
School of Engineering and Applied Science
University of Pennsylvania

Few questions about map reduce in Hbase

Reply via email to