Re: reduce task hanging or just slow?

Colin Freas Mon, 31 Mar 2008 18:13:49 -0700

I believe that this is exactly what happened.

I'm not sure exactly what happened, but the networking stack on the master
node was all screwed up somehow.  All the machines serve double duty as
development boxes, and they're on two different networks.  The master node
could contact the cluster network but not the open net.  Once we got that
working, things seemed alright, even though before that all the cluster
machines could contact the master node on private gig-e network.

So, this is a pain in the ass.  Is there a way to get it to bind hostnames
to the ips in my slaves file?  Or just use the ips in slaves outright?  And
is there some way to know for sure this is what the problem is?  Is this
related to HADOOP-1374?  Could that bug be this hostname thing?

-Colin

On Mon, Mar 31, 2008 at 8:58 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:

> Hi:
>    I have met the similar problem with you.  Finally, I found that this
> problem was caused by the hostname resolution because hadoop use hostname
> to
> access other nodes.
>    To fix this, try open your jobtracker log file( It often resides in
> $HADOOP_HOME/logs/hadoop-xxxx-jobtracker-xxxx.log ) to see if there is a
> error:
> "FATAL org.apache.hadoop.mapred.JobTracker: java.net.UnknownHostException:
> Invalid hostname for server: local"
>    If, it is, adding ip-hostname pairs to /etc/hosts files on all of you
> nodes may fix this problem.
>
> Good luck and best regards.
>
> Mafish
>
> --
> [EMAIL PROTECTED]
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>

Re: reduce task hanging or just slow?

Reply via email to