Re: All tasktrackers access same site at the same time (hadoop) please help

Doug Cutting Wed, 15 Feb 2006 14:56:22 -0800

Andrzej Bialecki wrote:

(FYI: if you wonder how it was working before, the trick was to generatejust 1 split for the fetch job, which then lead to just one task beingcreated for any input fetchlist.

I don't think that's right. The generator uses setNumReduceTasks() tothe desired number of fetch tasks, to control how many host-disjointfetchlists are generated. Then the fetcher does not permit input filesto be split, so that fetch tasks remain host-disjoint. So lots ofsplits can be generated, by default one per mapred.map.tasks, permittinglots of parallel fetching.

This should still work. If it does not, I'd be interested to hear moredetails.


Doug

Re: All tasktrackers access same site at the same time (hadoop) please help

Reply via email to