And keep in mind that one split is not necessary 1 file. That depends on the InputFormat. For example, the MultipleInputFormat, clubs together multiple files in 1 split.
On Thu, Sep 23, 2010 at 3:16 PM, Greg Roelofs <[email protected]> wrote: > > Can a map task work on more than one input split? > > As far as I can tell from reading the code, no (at least, not yet). Code > such as createCache() in JobInProgress implicitly assumes a one-to-one > mapping > between maps[] and splits[]. > > MR-1220 (small-jobs "combo task" optimization) will change that in some > sense, > but fundamentally, the correspondence between maps and splits is pretty > well > baked in, I believe. (In fact, I'm pretty sure splits are created based on > some goal for the number of maps--i.e., maps and splits are one-to-one > almost > by definition.) > > I might be wrong about all this, of course. :-) > > Greg >
