Re: Doubt in HBase

Amandeep Khurana Thu, 20 Aug 2009 23:41:07 -0700

A reducer gets all values for a particular key. So, I think it'll just
waste the extra reducers. They won't get any input records.


On 8/20/09, john smith <[email protected]> wrote:
> Thanks for all your replies guys ,.As bharath said , what is the case when
> number of reducers becomes more than number of distinct Map key outputs?
>
> On Fri, Aug 21, 2009 at 9:39 AM, bharath vissapragada <
> [email protected]> wrote:
>
>> Aamandeep , Gray and Purtell thanks for your replies .. I have found them
>> very useful.
>>
>> You said to increase the number of reduce tasks . Suppose the number of
>> reduce tasks is more than number of distinct map output keys , some of the
>> reduce processes may go waste ? is that the case?
>>
>> Also  I have one more doubt ..I have 5 values for a corresponding key on
>> one
>> region  and other 2 values on 2 different region servers.
>> Does hadoop Map reduce take care of moving these 2 diff values to the
>> region
>> with 5 values instead of moving those 5 values to other system to minimize
>> the dataflow? Is this what is happening inside ?
>>
>> On Fri, Aug 21, 2009 at 9:03 AM, Andrew Purtell <[email protected]>
>> wrote:
>>
>> > The behavior of TableInputFormat is to schedule one mapper for every
>> table
>> > region.
>> >
>> > In addition to what others have said already, if your reducer is doing
>> > little more than storing data back into HBase (via TableOutputFormat),
>> then
>> > you can consider writing results back to HBase directly from the mapper
>> to
>> > avoid incurring the overhead of sort/shuffle/merge which happens within
>> the
>> > Hadoop job framework as map outputs are input into reducers. For that
>> type
>> > of use case -- using the Hadoop mapreduce subsystem as essentially a
>> > grid
>> > scheduler -- something like job.setNumReducers(0) will do the trick.
>> >
>> > Best regards,
>> >
>> >   - Andy
>> >
>> >
>> >
>> >
>> > ________________________________
>> > From: john smith <[email protected]>
>> > To: [email protected]
>> > Sent: Friday, August 21, 2009 12:42:36 AM
>> > Subject: Doubt in HBase
>> >
>> > Hi all ,
>> >
>> > I have one small doubt . Kindly answer it even if it sounds silly.
>> >
>> > Iam using Map Reduce in HBase in distributed mode .  I have a table
>> > which
>> > spans across 5 region servers . I am using TableInputFormat to read the
>> > data
>> > from the tables in the map . When i run the program , by default how
>> > many
>> > map regions are created ? Is it one per region server or more ?
>> >
>> > Also after the map task is over.. reduce task is taking a bit more time
>> > .
>> > Is
>> > it due to moving the map output across the regionservers? i.e, moving
>> > the
>> > values of same key to a particular reduce phase to start the reducer? Is
>> > there any way i can optimize the code (e.g. by storing data of same
>> reducer
>> > nearby )
>> >
>> > Thanks :)
>> >
>> >
>> >
>> >
>>
>


-- 


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Re: Doubt in HBase

Reply via email to