On Nov 22, 2009, at 4:48 PM, Jeff Zhang wrote:
My concern is that it is just like hard code to use
conf.setNumReduceTasks
on the configuration. It is not flexible, so my idea is that adding an
interface to change the reducer number dynamically according the
different
size of input data set.
You misunderstand. I meant doing something like:
public class MyInputFormat ....
public InputSplit[] getSplits(JobConf conf) {
InputSplit[] result = ...;
// compute total size of input
conf.setNumReduceTasks(max(6, size / 10G));
}
}
I haven't checked the code to make sure it will work, but I believe it
will.
-- Owen