On Thu, Aug 20, 2009 at 9:42 AM, john smith <[email protected]> wrote:
> Hi all , > > I have one small doubt . Kindly answer it even if it sounds silly. > No questions are silly.. Dont worry > > Iam using Map Reduce in HBase in distributed mode . I have a table which > spans across 5 region servers . I am using TableInputFormat to read the > data > from the tables in the map . When i run the program , by default how many > map regions are created ? Is it one per region server or more ? > If you set the number of map tasks to a high number, it automatically spawns one map task for each region (not region server). Otherwise, it'll spawn the number you have explicitly specified in the job. > > Also after the map task is over.. reduce task is taking a bit more time . > Is > it due to moving the map output across the regionservers? i.e, moving the > values of same key to a particular reduce phase to start the reducer? Is > there any way i can optimize the code (e.g. by storing data of same reducer > nearby ) > Increase the number of reducers. Each reducer will have lesser data to move. > > Thanks :) >
