Turns out, it does cause problems later on. I think the problem is that the slaves have, in their hosts files:
127.0.0.1 localhost.localdomain localhost 127.0.0.1 machinename.cse.sc.edu machinename The reduce phase fails because the reducer cannot get data from the mappers as it tries to open a connection to "http://localhost:...." This is kinda annoying as all the hostnames resolve properly using DNS. I think it qualify as a hadoop bug, or maybe not. Jose On Wed, Jul 23, 2008 at 10:19 AM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: > That's good. :) > >> Will this cause bigger problems later on? or should I just ignore it. > > I'm not sure, But I guess there is no problem. > Does anyone have some experience with that? > > Regards, Edward J. Yoon > > On Wed, Jul 23, 2008 at 11:05 PM, Jose Vidal <[EMAIL PROTECTED]> wrote: >> Thanks! that worked. I was able to run dfs and put some files in it. >> >> However, when I go to my namenode at http://namenode:50070 I see that >> all the datanodes have a name of "localhost". >> >> Will this cause bigger problems later on? or should I just ignore it. >> >> Jose >> >> On Tue, Jul 22, 2008 at 6:48 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: >>>> So, do I need to change the host file in all the slaves, or just the >>>> namenode? >>> >>> Just the namenode. >>> >>> Thanks, Edward >>> >>> On Wed, Jul 23, 2008 at 7:45 AM, Jose Vidal <[EMAIL PROTECTED]> wrote: >>>> Yes, the host file just has: >>>> >>>> 127.0.0.1 localhost hermes.cse.sc.edu hermes >>>> >>>> So, do I need to change the host file in all the slaves, or just the >>>> namenode? >>>> >>>> I'm not root on these machines so changing these requires gentle >>>> handling of our sysadmin.... >>>> >>>> Jose >>>> >>>> On Tue, Jul 22, 2008 at 5:37 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote: >>>>> If you have a static address for the machine, make sure that your >>>>> hosts file is pointing to the static address for the namenode host >>>>> name as opposed to the 127.0.0.1 address. It should look something >>>>> like this with the values replaced with your values. >>>>> >>>>> 127.0.0.1 localhost.localdomain localhost >>>>> 192.x.x.x yourhost.yourdomain.com yourhost >>>>> >>>>> - Edward >>>>> >>>>> On Wed, Jul 23, 2008 at 6:03 AM, Jose Vidal <[EMAIL PROTECTED]> wrote: >>>>>> I'm trying to install hadoop on our linux machine but after >>>>>> start-all.sh none of the slaves can connect: >>>>>> >>>>>> 2008-07-22 16:35:27,534 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: >>>>>> /************************************************************ >>>>>> STARTUP_MSG: Starting DataNode >>>>>> STARTUP_MSG: host = thetis/127.0.0.1 >>>>>> STARTUP_MSG: args = [] >>>>>> STARTUP_MSG: version = 0.16.4 >>>>>> STARTUP_MSG: build = >>>>>> http://svn.apache.org/repos/asf/hadoop/core/branches/bran >>>>>> ch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 >>>>>> ************************************************************/ >>>>>> 2008-07-22 16:35:27,643 WARN org.apache.hadoop.dfs.DataNode: Invalid >>>>>> directory i >>>>>> n dfs.data.dir: directory is not writable: /work >>>>>> 2008-07-22 16:35:27,699 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 1 time(s). >>>>>> 2008-07-22 16:35:28,700 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 2 time(s). >>>>>> 2008-07-22 16:35:29,700 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 3 time(s). >>>>>> 2008-07-22 16:35:30,701 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 4 time(s). >>>>>> 2008-07-22 16:35:31,702 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 5 time(s). >>>>>> 2008-07-22 16:35:32,702 INFO org.apache.hadoop.ipc.Client: Retrying >>>>>> connect to s >>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 6 time(s). >>>>>> >>>>>> same for the tasktrackers (port 9001). >>>>>> >>>>>> I think the problem has something to do with name resolution. Check >>>>>> these out: >>>>>> >>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> telnet hermes.cse.sc.edu 9000 >>>>>> Trying 127.0.0.1... >>>>>> Connected to hermes.cse.sc.edu (127.0.0.1). >>>>>> Escape character is '^]'. >>>>>> bye >>>>>> Connection closed by foreign host. >>>>>> >>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> host hermes.cse.sc.edu >>>>>> hermes.cse.sc.edu has address 129.252.130.148 >>>>>> >>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> telnet 129.252.130.148 9000 >>>>>> Trying 129.252.130.148... >>>>>> telnet: connect to address 129.252.130.148: Connection refused >>>>>> telnet: Unable to connect to remote host: Connection refused >>>>>> >>>>>> So, the first one connects but not the second one, but they both go to >>>>>> the same machine:port. My guess is that the hadoop server is closing >>>>>> the connection, but why? >>>>>> >>>>>> Thanks, >>>>>> Jose >>>>>> >>>>>> -- >>>>>> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu >>>>>> University of South Carolina http://www.multiagent.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Edward J. Yoon, >>>>> http://blog.udanax.org >>>>> >>>> >>>> >>>> >>>> -- >>>> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu >>>> University of South Carolina http://www.multiagent.com >>>> >>> >>> >>> >>> -- >>> Best regards, Edward J. Yoon >>> [EMAIL PROTECTED] >>> http://blog.udanax.org >>> >> >> >> >> -- >> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu >> University of South Carolina http://www.multiagent.com >> > > > > -- > Best regards, Edward J. Yoon > [EMAIL PROTECTED] > http://blog.udanax.org > -- Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu University of South Carolina http://www.multiagent.com
