Turns out, it does cause problems later on.

I think the problem is that the slaves have, in their hosts files:

127.0.0.1 localhost.localdomain localhost
127.0.0.1 machinename.cse.sc.edu machinename

The reduce phase fails because the reducer cannot get data from the
mappers as it tries to open a connection to "http://localhost:....";

This is kinda annoying as all the hostnames resolve properly using
DNS. I think it qualify as a hadoop bug, or maybe not.

Jose

On Wed, Jul 23, 2008 at 10:19 AM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
> That's good. :)
>
>> Will this cause bigger problems later on? or should I just ignore it.
>
> I'm not sure, But I guess there is no problem.
> Does anyone have some experience with that?
>
> Regards, Edward J. Yoon
>
> On Wed, Jul 23, 2008 at 11:05 PM, Jose Vidal <[EMAIL PROTECTED]> wrote:
>> Thanks! that worked. I was able to run dfs and put some files in it.
>>
>> However, when I go to my namenode at http://namenode:50070 I see that
>> all the datanodes have a name of "localhost".
>>
>> Will this cause bigger problems later on? or should I just ignore it.
>>
>> Jose
>>
>> On Tue, Jul 22, 2008 at 6:48 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
>>>> So, do I need to change the host file in all the slaves, or just the 
>>>> namenode?
>>>
>>> Just the namenode.
>>>
>>> Thanks, Edward
>>>
>>> On Wed, Jul 23, 2008 at 7:45 AM, Jose Vidal <[EMAIL PROTECTED]> wrote:
>>>> Yes, the host file just has:
>>>>
>>>> 127.0.0.1 localhost hermes.cse.sc.edu hermes
>>>>
>>>> So, do I need to change the host file in all the slaves, or just the 
>>>> namenode?
>>>>
>>>> I'm not root on these machines so changing these requires gentle
>>>> handling of our sysadmin....
>>>>
>>>> Jose
>>>>
>>>> On Tue, Jul 22, 2008 at 5:37 PM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
>>>>> If you have a static address for the machine, make sure that your
>>>>> hosts file is pointing to the static address for the namenode host
>>>>> name as opposed to the 127.0.0.1 address. It should look something
>>>>> like this with the values replaced with your values.
>>>>>
>>>>> 127.0.0.1               localhost.localdomain localhost
>>>>> 192.x.x.x               yourhost.yourdomain.com yourhost
>>>>>
>>>>> - Edward
>>>>>
>>>>> On Wed, Jul 23, 2008 at 6:03 AM, Jose Vidal <[EMAIL PROTECTED]> wrote:
>>>>>> I'm trying to install hadoop on our linux machine but after
>>>>>> start-all.sh none of the slaves can connect:
>>>>>>
>>>>>> 2008-07-22 16:35:27,534 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
>>>>>> /************************************************************
>>>>>> STARTUP_MSG: Starting DataNode
>>>>>> STARTUP_MSG:   host = thetis/127.0.0.1
>>>>>> STARTUP_MSG:   args = []
>>>>>> STARTUP_MSG:   version = 0.16.4
>>>>>> STARTUP_MSG:   build = 
>>>>>> http://svn.apache.org/repos/asf/hadoop/core/branches/bran
>>>>>> ch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May  2 00:18:12 UTC 2008
>>>>>> ************************************************************/
>>>>>> 2008-07-22 16:35:27,643 WARN org.apache.hadoop.dfs.DataNode: Invalid 
>>>>>> directory i
>>>>>> n dfs.data.dir: directory is not writable: /work
>>>>>> 2008-07-22 16:35:27,699 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 1 time(s).
>>>>>> 2008-07-22 16:35:28,700 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 2 time(s).
>>>>>> 2008-07-22 16:35:29,700 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 3 time(s).
>>>>>> 2008-07-22 16:35:30,701 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 4 time(s).
>>>>>> 2008-07-22 16:35:31,702 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 5 time(s).
>>>>>> 2008-07-22 16:35:32,702 INFO org.apache.hadoop.ipc.Client: Retrying 
>>>>>> connect to s
>>>>>> erver: hermes.cse.sc.edu/129.252.130.148:9000. Already tried 6 time(s).
>>>>>>
>>>>>> same for the tasktrackers (port 9001).
>>>>>>
>>>>>> I think the problem has something to do with name resolution. Check 
>>>>>> these out:
>>>>>>
>>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> telnet hermes.cse.sc.edu 9000
>>>>>> Trying 127.0.0.1...
>>>>>> Connected to hermes.cse.sc.edu (127.0.0.1).
>>>>>> Escape character is '^]'.
>>>>>> bye
>>>>>> Connection closed by foreign host.
>>>>>>
>>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> host hermes.cse.sc.edu
>>>>>> hermes.cse.sc.edu has address 129.252.130.148
>>>>>>
>>>>>> [EMAIL PROTECTED]:~/hadoop-0.16.4> telnet 129.252.130.148 9000
>>>>>> Trying 129.252.130.148...
>>>>>> telnet: connect to address 129.252.130.148: Connection refused
>>>>>> telnet: Unable to connect to remote host: Connection refused
>>>>>>
>>>>>> So, the first one connects but not the second one, but they both go to
>>>>>> the same machine:port. My guess is that the hadoop server is closing
>>>>>> the connection, but why?
>>>>>>
>>>>>> Thanks,
>>>>>> Jose
>>>>>>
>>>>>> --
>>>>>> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu
>>>>>> University of South Carolina http://www.multiagent.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Edward J. Yoon,
>>>>> http://blog.udanax.org
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu
>>>> University of South Carolina http://www.multiagent.com
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards, Edward J. Yoon
>>> [EMAIL PROTECTED]
>>> http://blog.udanax.org
>>>
>>
>>
>>
>> --
>> Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu
>> University of South Carolina http://www.multiagent.com
>>
>
>
>
> --
> Best regards, Edward J. Yoon
> [EMAIL PROTECTED]
> http://blog.udanax.org
>



-- 
Jose M. Vidal <[EMAIL PROTECTED]> http://jmvidal.cse.sc.edu
University of South Carolina http://www.multiagent.com

Reply via email to