Hi all,
Currently I'm able to run map-reduce jobs from box where NameNode and
JobTracker are running. But I'd like to run my jobs from separate box,
from which I have access to HDFS. I have updated params fs.default.name
and mapred.job.tracker in local hadoop dir to point to the clusters
master. Now Hadoop returns me following error:
[EMAIL PROTECTED]:/usr/local/hadoop-0.16.0$ bin/hadoop jar
hadoop-0.16.0-examples.jar wordcount /user/username/gutenberg
/user/username/gutenberg-output
08/03/31 10:21:46 INFO mapred.FileInputFormat: Total input paths to
process : 3
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
/mnt/hadoop/mapred/system/job_200803210640_0852/job.xml: No such file or
directory
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:159)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133)
at
org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1083)
...
Here account 'username' has passwordless access to master box. Cluster
runs over EC2.
As a variant I can run tasks via ssh, i.e.
ssh master /usr/local/hadoop-0.16.0bin/hadoop jar
/home/username/jobs/hadoop-0.16.0-examples.jar wordcount
/user/username/gutenberg /user/username/gutenberg-output
But you need to put your jar file to the NameNode box before you run it.
Thanks in advance.
--
Andrey Pankov