Thanks Scott. I guess what I tried to do here is to squeeze the cluster into 
our current system given some fixed network configuration. It is not flexible 
to modify some settings to accommodate the hadoop cluster. I checked hosts 
file, dig/host, resolv.conf, and netstat -r, etc for name resolution, didn't 
see any obvious violation against what hadoop cluster requires and tried tools 
such as tcpdump and telnet, etc. No quick solution found yet. So, I simply 
removed the problematic master box to make my world easier and will return to 
this problem later :)

Thanks,

Michael

--- On Tue, 3/9/10, Scott Carey <[email protected]> wrote:

From: Scott Carey <[email protected]>
Subject: Re: where does jobtracker get the IP and port of namenode?
To: "[email protected]" <[email protected]>
Date: Tuesday, March 9, 2010, 3:22 PM


On Mar 8, 2010, at 11:38 PM, jiang licht wrote:

> I guess my confusion is this:
> 
> I point "fs.default.name" to hdfs:A:50001 in core-site.xml (A is IP address). 
> I assume when tasktracker starts, it should use A:50001 to contact namenode. 
> But actually, tasktracker log shows that it uses B which is IP address of 
> another network interface of the  namenode box and because the tasktracker 
> box cannot reach address B, the tasktracker simply retries connection and 
> finally fails to start. I read some source code in 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize and it seems to me 
> the namenode address is passed in earlier from what is specified in 
> "fs.default.name". Is this correct that the namenode address used here by 
> tasktracker comes from "fs.default.name" in core-site.xml or somehow there is 
> another step in which this value is changed? Could someone elaborate this 
> process how tasktracker resolves namenode and contacts it? Thanks!
> 

Hadoop is rather annoyingly strict on how dns and reverse dns are aligned.  I'm 
not sure if it applies to your specific problem, but:
Even if configured to talk to A, if A is an IP address, in some places it will 
reverse-dns that IP, then dns resolve the resolved name.

So if IP A maps by reverse dns (via dns or a hosts file or whatever) to name 
FOO, and FOO resolves to IP address B, then that is likely your problem.
datanodes/namenodes with multiple ip addresses often have problems like this.  
I wish that if you configured it to 'talk to IP address A' all it did was try 
and talk to IP address A, but thats not how it works.
I'm used to seeing this as a datanode network configuration problem, not a 
namenode problem.  But you mention that the server has more than one network 
interface, so it may be related.


> Thanks,
> 
> Michael
> 
> --- On Tue, 3/9/10, jiang licht <[email protected]> wrote:
> 
> From: jiang licht <[email protected]>
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: [email protected]
> Date: Tuesday, March 9, 2010, 12:20 AM
> 
> Sorry, that was a typo in my first post. I did use 'fs.default.name' in 
> core-site.xml.
> 
> BTW, the following is the list of error message when tasktracker was started 
> and shows that tasktracker failed to connect to namenode A:50001.
> 
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = HOSTNAME/127.0.0.1
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3; compiled 
> by 'root' on Tue Feb  9 13:40:08 EST 2010
> ************************************************************/
> 2010-03-09 00:08:50,199 INFO org.mortbay.log: Logging to 
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> org.mortbay.log.Slf4jLog
> 2010-03-09 00:08:50,341 INFO org.apache.hadoop.http.HttpServer: Port returned 
> by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening 
> the listener on 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: 
> listener.getLocalPort() returned 50060 
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2010-03-09 00:08:50,350 INFO org.apache.hadoop.http.HttpServer: Jetty bound 
> to port 50060
> 2010-03-09 00:08:50,350 INFO org.mortbay.log: jetty-6.1.14
> 2010-03-09 00:08:50,707 INFO org.mortbay.log: Started 
> [email protected]:50060
> 2010-03-09 00:08:50,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=TaskTracker, sessionId=
> 2010-03-09 00:08:50,749 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=TaskTracker, port=52550
> 2010-03-09 00:08:50,799 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 52550: starting
> 2010-03-09 00:08:50,800 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 52550: starting
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: 
> TaskTracker up at: HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,801 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
> tracker tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,802 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 52550: starting
> 2010-03-09 00:08:50,854 INFO org.apache.hadoop.mapred.TaskTracker:  Using 
> MemoryCalculatorPlugin : 
> org.apache.hadoop.util.linuxmemorycalculatorplu...@27b4c1d7
> 2010-03-09 00:08:50,856 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
> thread: Map-events fetcher for all reduce tasks on 
> tracker_HOSTNAME:HOSTNAME/127.0.0.1:52550
> 2010-03-09 00:08:50,858 WARN org.apache.hadoop.mapred.TaskTracker: 
> TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is 
> disabled.
> 2010-03-09 00:08:50,859 INFO org.apache.hadoop.mapred.IndexCache: IndexCache 
> created with max memory = 10485760
> 2010-03-09 00:09:11,970 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: /A:50001. Already tried 0 time(s).
> 2010-03-09 00:09:32,972 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: /A:50001. Already tried 1 time(s).
> ...
> 
> Thanks,
> 
> Michael
> 
> --- On Mon, 3/8/10, Arun C Murthy <[email protected]> wrote:
> 
> From: Arun C Murthy <[email protected]>
> Subject: Re: where does jobtracker get the IP and port of namenode?
> To: [email protected]
> Date: Monday, March 8, 2010, 10:26 PM
> 
>> Here's what is set in core-site.xml
>> 
>> dfs.default.name=>hdfs://B:50001
>> 
> 
> That should be 'fs.default.name' ...
> 
> Arun
> 
> 
> 
> 
>       
> 
> 




      

Reply via email to