Gal Nitzan wrote:
I noticed all tasktrackers are participating in the fetch.

I have only one site in the injected seed file

I have 5 tasktrackers all except one access the same site.

I just fixed a bug related to this.  Please try updating.

The problem was that MapReduce recently started supporting speculative execution, where, if some tasks appear to be executing slowly and there are idle nodes, then these tasks automatically are run in parallel on another node, and the results of the first that finishes are used. But this is not appropriate for fetching. So I just added a mechanism to Hadoop to disable it and then disabled it in the Fetcher.

Note also that the slaves file is now located in the conf/ directory, as is a new file named hadoop-env.sh. This contains all relevant environment variables, so that we no longer have to rely on ssh's environment passing feature.

Doug

Reply via email to