Gal Nitzan wrote:
I noticed all tasktrackers are participating in the fetch.
I have only one site in the injected seed file
I have 5 tasktrackers all except one access the same site.
I just fixed a bug related to this. Please try updating.
The problem was that MapReduce recently started supporting speculative
execution, where, if some tasks appear to be executing slowly and there
are idle nodes, then these tasks automatically are run in parallel on
another node, and the results of the first that finishes are used. But
this is not appropriate for fetching. So I just added a mechanism to
Hadoop to disable it and then disabled it in the Fetcher.
Note also that the slaves file is now located in the conf/ directory, as
is a new file named hadoop-env.sh. This contains all relevant
environment variables, so that we no longer have to rely on ssh's
environment passing feature.
Doug