Hi, I am new to Hadoop and Hbase. I am trying to understand how to use map reduce with Hbase as source and sink and had following questions. Would appreciate if someone can answer them and may be point me to some sample code:
-- As far as I understood, the tables gets stored in different regions in Hbase which are split across various nodes in HDFS. Is there a way to control the amount of replication of a particular table ? --When we try to use a table scanner, it automatically switches between various regions of a table which may be present across different nodes and returns us the row handle. So it is a single process doing that. Am I correct ? -- When we use TableMap to run map reduce jobs on Hbase, it automatically creates several map jobs i.e. one per region and performs map operation on the key range of that particular region. So if I use a table scanner inside a map job, will I be still iterating through only row ranges of that particular region or again the whole table ? -- What is the best way if I may want to iterate through all the rows for a particualr region in a map job. This may be required to perform a select operation parallely. Sorry for the long email. Many of the questions may be basic. I appreciate if someone can answer them. Also any suggestions of implementing joins using map reduce on hbase. Thanks -- Nishant Khurana Candidate for Masters in Engineering (Dec 2009) Computer and Information Science School of Engineering and Applied Science University of Pennsylvania
