Re: Basic question on how reducer works

Harsh J Mon, 09 Jul 2012 10:58:39 -0700

Manoj,

Think of it this way, and you shouldn't be confused: A reducer == a partition.


For (1) - Partitioners do not 'call' a reduce, just write the data
with a proper partition ID. The reducer thats same as the partition
ID, picks it up for itself later. This we have already explained
earlier.

For (2) - For what scenario do you _not_ want multiple reducers
handling each partition uniquely, when it is possible to scale that
way?

On Mon, Jul 9, 2012 at 11:22 PM, Manoj Babu <manoj...@gmail.com> wrote:
> Hi,
>
> It would be more helpful, If you could more details for the below doubts.
>
> 1, How the partitioner knows which reducer needs to be called?
> 2, When we are using more than one reducers, the output gets separated.
> Actually for what scenario we have to go for multiple reducers?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>>
>> Robert,
>>
>> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
>>
>> Hi,
>>
>> I have some questions related to basic functionality in Hadoop.
>>
>> 1. When a Mapper process the intermediate output data, how it knows how
>> many partitions to do(how many reducers will be) and how much data to go in
>> each  partition for each reducer ?
>>
>> 2. A JobTracker when assigns a task to a reducer, it will also specify the
>> locations of intermediate output data where it should retrieve it right ?
>> But how a reducer will know from each remote location with intermediate
>> output what portion it has to retrieve only ?
>>
>>
>> To add to Harsh's comment. Essentially the TT *knows* where the output of
>> a given map-id/reduce-id pair is present via an output-file/index-file
>> combination.
>>
>> Arun
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>



-- 
Harsh J

Re: Basic question on how reducer works

Reply via email to