pig-user  

Re: Setting reducers

Bae, Jae Hyeon
Tue, 16 Mar 2010 06:16:21 -0700

You can refer to mapreduce documents.

According to it, reasonable number of reducers is 0.95 * <maximum
number of reducers per node> * <number of nodes>.

Actually, reducers are running in parallel with a number of mappers.
They are copying output results of mappers and merging in background.
After all mappers are finished, final merge sort is executed and
actual reduce() functions are ready to be started. So, the best status
is that reducing jobs including copying output results of mappers,
merging and sorting those results and executing reduce() are finished
in a single round.

You don't have to run more reducers exceeding maximum capacity of
hadoop clusters. In my opinion, 0.95 means that we should remain a few
nodes to prepare machine's malfunctioning.

2010/3/16 Rob Stewart <robstewar...@googlemail.com>:
> Hi, quick question:
>
> What is the origin of the formula for specifying the number of reducers for
> a Pig job (using PARALLEL).
>
> I have it as:
> <number of nodes> * <maximum number of reducers per node> * 0.9
>
> Is there a theory or model behind this formula?
>
> Rob Stewart
>