I believe that this is exactly what happened. I'm not sure exactly what happened, but the networking stack on the master node was all screwed up somehow. All the machines serve double duty as development boxes, and they're on two different networks. The master node could contact the cluster network but not the open net. Once we got that working, things seemed alright, even though before that all the cluster machines could contact the master node on private gig-e network.
So, this is a pain in the ass. Is there a way to get it to bind hostnames to the ips in my slaves file? Or just use the ips in slaves outright? And is there some way to know for sure this is what the problem is? Is this related to HADOOP-1374? Could that bug be this hostname thing? -Colin On Mon, Mar 31, 2008 at 8:58 PM, Mafish Liu <[EMAIL PROTECTED]> wrote: > Hi: > I have met the similar problem with you. Finally, I found that this > problem was caused by the hostname resolution because hadoop use hostname > to > access other nodes. > To fix this, try open your jobtracker log file( It often resides in > $HADOOP_HOME/logs/hadoop-xxxx-jobtracker-xxxx.log ) to see if there is a > error: > "FATAL org.apache.hadoop.mapred.JobTracker: java.net.UnknownHostException: > Invalid hostname for server: local" > If, it is, adding ip-hostname pairs to /etc/hosts files on all of you > nodes may fix this problem. > > Good luck and best regards. > > Mafish > > -- > [EMAIL PROTECTED] > Institute of Computing Technology, Chinese Academy of Sciences, Beijing. >
