Aamandeep , Gray and Purtell thanks for your replies .. I have found them
very useful.

You said to increase the number of reduce tasks . Suppose the number of
reduce tasks is more than number of distinct map output keys , some of the
reduce processes may go waste ? is that the case?

Also  I have one more doubt ..I have 5 values for a corresponding key on one
region  and other 2 values on 2 different region servers.
Does hadoop Map reduce take care of moving these 2 diff values to the region
with 5 values instead of moving those 5 values to other system to minimize
the dataflow? Is this what is happening inside ?

On Fri, Aug 21, 2009 at 9:03 AM, Andrew Purtell <[email protected]> wrote:

> The behavior of TableInputFormat is to schedule one mapper for every table
> region.
>
> In addition to what others have said already, if your reducer is doing
> little more than storing data back into HBase (via TableOutputFormat), then
> you can consider writing results back to HBase directly from the mapper to
> avoid incurring the overhead of sort/shuffle/merge which happens within the
> Hadoop job framework as map outputs are input into reducers. For that type
> of use case -- using the Hadoop mapreduce subsystem as essentially a grid
> scheduler -- something like job.setNumReducers(0) will do the trick.
>
> Best regards,
>
>   - Andy
>
>
>
>
> ________________________________
> From: john smith <[email protected]>
> To: [email protected]
> Sent: Friday, August 21, 2009 12:42:36 AM
> Subject: Doubt in HBase
>
> Hi all ,
>
> I have one small doubt . Kindly answer it even if it sounds silly.
>
> Iam using Map Reduce in HBase in distributed mode .  I have a table which
> spans across 5 region servers . I am using TableInputFormat to read the
> data
> from the tables in the map . When i run the program , by default how many
> map regions are created ? Is it one per region server or more ?
>
> Also after the map task is over.. reduce task is taking a bit more time .
> Is
> it due to moving the map output across the regionservers? i.e, moving the
> values of same key to a particular reduce phase to start the reducer? Is
> there any way i can optimize the code (e.g. by storing data of same reducer
> nearby )
>
> Thanks :)
>
>
>
>

Reply via email to