I think we have the same goal, could you please read mapredure-1226 for some
idea or give your comments?


2009/11/22 Jeff Zhang <[email protected]>

> Hi all,
>
>
>
> During my work, I often run the same map reduce jobs on different size of
> data set. The mapper task number can change automatically according the
> input data set. But I have to set different reducer number according
> different data set size.
>
> But I do not want to be bothered to do that, it is not convenient for users
> in my opinions. And it will also harm the system’s automation(because in an
> automation system we can not predict the size of input data set). So I
> think
> hadoop should have a more intelligent mechanism to control the reducer
> number according the input data.
>
> Here I suggest to add an new interface named ReduceNumManager which has a
> method getReduceNum(InputFormat inputFormat)  the code snippet is as
> following (the interface needs to be refined):
>
>
>
> *public** **interface** ReduceNumManager {***
>
> * *
>
> *    **int** getReduceNum(InputFormat inputFormat);***
>
> *}***
>
>
>
> And users can set this class in JobConf by JobConf.setReduceNumMamanger.
> And the JobClient use this class to determine the reduce number.
>
> e.g. if the InputFormat is the FileInputFormat, then we can have a
> FileReduceNumManager which implements this interface, and this class
> compute
> the reducer number according the size of input file.
>
>
>
> I think this work will benefit users and Pig and Hive(not sure) will also
> benefit from this, because it is not convenient for users to set different
> reduce numbers each time using the same script but for different size of
> data set.
>
> If we provide such a mechanism , they only need to provide their customized
> implementation.
>
>
>
> This is my initial idea, looking forward to hear from experts’ feedback.
>
>
>
> Thank you
>
>
>
> Jeff Zhang
>



-- 
Thank you!
谢谢!

张钊宁
zzningxp

Reply via email to