To answer my own question, I added explicit entries for my hosts in the cluster to each /etc/hosts file. Apparently, our DNS was mucking up the name resolution (even though I had specified the IP address in the config files)
Lession learned: ensure there's a solid entry for your master node in each slave node's /etc/hosts file. I hope this saves someone else next time it happens. On Fri, Apr 10, 2009 at 2:28 PM, Seth Ladd <[email protected]> wrote: > Hello, > > I am using Hadoop 0.18.3 and Pig 0.2.0 across a small cluster of 4 > machines. I am using start-all.sh to boot up the cluster. The HDFS > system appears to be working really well. I am able to copy files in > and out of the filesystem. > > However, when I try to submit a Pig job, I am greeted by this exception: > > 2009-04-10 14:17:27,145 INFO org.apache.hadoop.mapred.TaskTracker: > Starting tracker tracker_sdi-kenglish:localhost/127.0.0.1:37109 > 2009-04-10 14:17:27,249 INFO org.apache.hadoop.mapred.TaskTracker: > Starting thread: Map-events fetcher for all reduce tasks on > tracker_sdi-kenglish:localhost/127.0.0.1:37109 > 2009-04-10 14:18:07,361 INFO org.apache.hadoop.mapred.TaskTracker: > LaunchTaskAction: attempt_200904101417_0001_m_000002_0 > 2009-04-10 14:18:10,353 WARN org.apache.hadoop.mapred.TaskTracker: > Error initializing attempt_200904101417_0001_m_000002_0: > java.io.IOException: Call to sladd/208.67.216.132:8020 failed on local > exception: java.io.EOFException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:751) > at org.apache.hadoop.ipc.Client.call(Client.java:719) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > at org.apache.hadoop.dfs.$Proxy5.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348) > at > org.apache.hadoop.dfs.DFSClient.createRPCNamenode(DFSClient.java:103) > at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:172) > at > org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:67) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339) > at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:638) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1297) > at > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:937) > at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1334) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2343) > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:115) > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:509) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:442) > > What's VERY odd about this, is I don't have a 208.67.216.132 node in > my system at all. Where is hadoop getting this IP from? > > My hadoop-site.xml is below: > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://10.0.6.110/</value> > <final>true</final> > </property> > <property> > <name>mapred.job.tracker</name> > <value>10.0.6.110:8012</value> > <final>true</final> > </property> > <property> > <name>hadoop.tmp.dir</name> > <value>/opt/cluster/hadoop-tmp</value> > <final>true</final> > </property> > </configuration> > > I've confirmed SSH works just fine, ping works, etc. HDFS works as well. > > Does the above exception have any clues as to why I can't run a Pig > MapReduce job? > > Your help or tips are much appreciated, > Seth >
