Re: Doubt in HBase

Jonathan Gray Thu, 20 Aug 2009 13:25:55 -0700

What Amandeep said.

Also, one clarification for you. You mentioned the reduce task movingmap output across regionservers. Remember, HBase is just a MapReduceinput source or output sink. The sort/shuffle/reduce is a part ofHadoop MapReduce and has nothing to do with HBase directly. It isutilizing the JobTracker/TaskTrackers, not the RegionServers.

Like AK said, you can increase the number of reducers, or reduce theamount of data you output from the maps.


JG

Amandeep Khurana wrote:

On Thu, Aug 20, 2009 at 9:42 AM, john smith <[email protected]> wrote:

Hi all ,

I have one small doubt . Kindly answer it even if it sounds silly.


No questions are silly.. Dont worry

Iam using Map Reduce in HBase in distributed mode .  I have a table which
spans across 5 region servers . I am using TableInputFormat to read the
data
from the tables in the map . When i run the program , by default how many
map regions are created ? Is it one per region server or more ?


If you set the number of map tasks to a high number, it automatically spawns
one map task for each region (not region server). Otherwise, it'll spawn the
number you have explicitly specified in the job.

Also after the map task is over.. reduce task is taking a bit more time .
Is
it due to moving the map output across the regionservers? i.e, moving the
values of same key to a particular reduce phase to start the reducer? Is
there any way i can optimize the code (e.g. by storing data of same reducer
nearby )


Increase the number of reducers. Each reducer will have lesser data to move.

Thanks :)

Re: Doubt in HBase

Reply via email to