I think we have the same goal, could you please read mapredure-1226 for some idea or give your comments?
2009/11/22 Jeff Zhang <[email protected]> > Hi all, > > > > During my work, I often run the same map reduce jobs on different size of > data set. The mapper task number can change automatically according the > input data set. But I have to set different reducer number according > different data set size. > > But I do not want to be bothered to do that, it is not convenient for users > in my opinions. And it will also harm the system’s automation(because in an > automation system we can not predict the size of input data set). So I > think > hadoop should have a more intelligent mechanism to control the reducer > number according the input data. > > Here I suggest to add an new interface named ReduceNumManager which has a > method getReduceNum(InputFormat inputFormat) the code snippet is as > following (the interface needs to be refined): > > > > *public** **interface** ReduceNumManager {*** > > * * > > * **int** getReduceNum(InputFormat inputFormat);*** > > *}*** > > > > And users can set this class in JobConf by JobConf.setReduceNumMamanger. > And the JobClient use this class to determine the reduce number. > > e.g. if the InputFormat is the FileInputFormat, then we can have a > FileReduceNumManager which implements this interface, and this class > compute > the reducer number according the size of input file. > > > > I think this work will benefit users and Pig and Hive(not sure) will also > benefit from this, because it is not convenient for users to set different > reduce numbers each time using the same script but for different size of > data set. > > If we provide such a mechanism , they only need to provide their customized > implementation. > > > > This is my initial idea, looking forward to hear from experts’ feedback. > > > > Thank you > > > > Jeff Zhang > -- Thank you! 谢谢! 张钊宁 zzningxp
