On Nov 22, 2009, at 4:48 PM, Jeff Zhang wrote:

My concern is that it is just like hard code to use conf.setNumReduceTasks
on the configuration. It is not flexible, so my idea is that adding an
interface to change the reducer number dynamically according the different
size of input data set.

You misunderstand. I meant doing something like:

public class MyInputFormat ....

  public InputSplit[] getSplits(JobConf conf) {
     InputSplit[] result = ...;
     // compute total size of input
     conf.setNumReduceTasks(max(6, size / 10G));
  }
}

I haven't checked the code to make sure it will work, but I believe it will.

-- Owen

Reply via email to