Hi, We have a cluster of 10 machines (one master (hostname: megh03), and nine slaves (hostname:meghXX)). The cluster is set up. Whenever I run a job, I get error on one machine megh08. Error is pasted here:
[meghad...@prashant hadoop-0.18.3]$ bin/hadoop jar hadoop-0.18.3-examples.jar wordcount conf out6 10/03/26 22:40:14 INFO mapred.FileInputFormat: Total input paths to process : 11 10/03/26 22:40:14 INFO mapred.FileInputFormat: Total input paths to process : 11 10/03/26 22:40:15 INFO mapred.JobClient: Running job: job_201003262242_0004 10/03/26 22:40:16 INFO mapred.JobClient: map 0% reduce 0% 10/03/26 22:40:19 INFO mapred.JobClient: map 8% reduce 0% 10/03/26 22:40:20 INFO mapred.JobClient: map 25% reduce 0% 10/03/26 22:40:21 INFO mapred.JobClient: map 91% reduce 0% 10/03/26 22:40:26 INFO mapred.JobClient: map 91% reduce 2% 10/03/26 22:40:28 INFO mapred.JobClient: Task Id : attempt_201003262242_0004_m_000006_0, Status : FAILED *Error initializing attempt_201003262242_0004_m_000006_0: java.net.ConnectException: Call to megh03/10.2.4.139:9000 failed on connection exception: java.net.ConnectException: Connection refused* at org.apache.hadoop.ipc.Client.wrapException(Client.java:743) at org.apache.hadoop.ipc.Client.call(Client.java:719) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348) at org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103) at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:172) at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:67) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:638) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1297) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:937) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1334) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2343) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:301) at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178) at org.apache.hadoop.ipc.Client.getConnection(Client.java:820) at org.apache.hadoop.ipc.Client.call(Client.java:705) ... 16 more 10/03/26 22:40:28 WARN mapred.JobClient: *Error reading task outputhttp://megh08:50060/tasklog?plaintext=true&taskid=attempt_201003262242_0004_m_000006_0&filter=stdout * 10/03/26 22:40:28 WARN mapred.JobClient: *Error reading task outputhttp://megh08:50060/tasklog?plaintext=true&taskid=attempt_201003262242_0004_m_000006_0&filter=stderr * 10/03/26 22:40:31 INFO mapred.JobClient: map 100% reduce 2% 10/03/26 22:40:36 INFO mapred.JobClient: Job complete: job_201003262242_0004 10/03/26 22:40:36 INFO mapred.JobClient: Counters: 17 10/03/26 22:40:36 INFO mapred.JobClient: File Systems 10/03/26 22:40:36 INFO mapred.JobClient: HDFS bytes read=48534 10/03/26 22:40:36 INFO mapred.JobClient: HDFS bytes written=26261 10/03/26 22:40:36 INFO mapred.JobClient: Local bytes read=32541 10/03/26 22:40:36 INFO mapred.JobClient: Local bytes written=70377 10/03/26 22:40:36 INFO mapred.JobClient: Job Counters 10/03/26 22:40:36 INFO mapred.JobClient: Launched reduce tasks=1 10/03/26 22:40:36 INFO mapred.JobClient: Rack-local map tasks=1 10/03/26 22:40:36 INFO mapred.JobClient: Launched map tasks=13 10/03/26 22:40:36 INFO mapred.JobClient: Data-local map tasks=11 10/03/26 22:40:36 INFO mapred.JobClient: Map-Reduce Framework 10/03/26 22:40:36 INFO mapred.JobClient: Reduce input groups=1521 10/03/26 22:40:36 INFO mapred.JobClient: Combine output records=3374 10/03/26 22:40:36 INFO mapred.JobClient: Map input records=1580 10/03/26 22:40:36 INFO mapred.JobClient: Reduce output records=1521 10/03/26 22:40:36 INFO mapred.JobClient: Map output bytes=63905 10/03/26 22:40:36 INFO mapred.JobClient: Map input bytes=47913 10/03/26 22:40:36 INFO mapred.JobClient: Combine input records=6498 10/03/26 22:40:36 INFO mapred.JobClient: Map output records=4645 10/03/26 22:40:36 INFO mapred.JobClient: Reduce input records=1521 Can anybody tell me what may be the problem here? -- Thanks and Regards, Prashant Ullegaddi, Search and Information Extraction Lab, IIIT-Hyderabad, India.