Andrzej Bialecki wrote:
(FYI: if you wonder how it was working before, the trick was to generate just 1 split for the fetch job, which then lead to just one task being created for any input fetchlist.

I don't think that's right. The generator uses setNumReduceTasks() to the desired number of fetch tasks, to control how many host-disjoint fetchlists are generated. Then the fetcher does not permit input files to be split, so that fetch tasks remain host-disjoint. So lots of splits can be generated, by default one per mapred.map.tasks, permitting lots of parallel fetching.

This should still work. If it does not, I'd be interested to hear more details.

Doug

Reply via email to