John, Yes, looks like your slave nodes aren't able to properly resolve some hostnames. Hadoop requires a sane network setup to work properly. Also, yes, you need to use a hostname for your fs.default.name and other configs to the extent possible.
The easiest way is to keep a properly synchronized /etc/hosts file. For example, it may look like so, on all machines: 127.0.0.1 localhost.localdomain localhost 192.168.0.1 master.hadoop master 192.168.0.2 slave3.hadoop slave3 (and so on…) (This way master can resolve slaves, and slaves can resolve master. If you have the time, setup a DNS, its the best thing to do.) Then, in core-site.xml you'll need: fs.default.name = hdfs://master And in mapred-site.xml: mapred.job.tracker = master:8021 That should do it, so long as the slave hosts can freely access the master hosts (no blockage of ports via firewall and such). On Tue, Sep 6, 2011 at 3:05 PM, john smith <[email protected]> wrote: > Hey My TT logs show this , > > 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker: Caught > exception: java.net.UnknownHostException: unknown host: rip-pc.local > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:853) > at org.apache.hadoop.ipc.Client.call(Client.java:723) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy5.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) > ^C at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > > > May be some error in configs ?? I am using IPs in the conf files ..should I > put entries in /etc/hosts files? > > On Tue, Sep 6, 2011 at 3:00 PM, john smith <[email protected]> wrote: > >> Hi Harsh, >> >> My jt log : http://pastebin.com/rXAEeDkC >> >> I have some startup exceptions (which doesn't matter much I guess) but the >> tail indicates that its locating the splits correctly and then it hangs ! >> >> Any idea? >> >> Thanks >> >> >> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <[email protected]> wrote: >> >>> I'd check the tail of JobTracker logs after a submit is done to see if >>> an error/warn there is causing this. And then dig further on >>> why/what/how. >>> >>> Hard to tell what your problem specifically is without logs :) >>> >>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <[email protected]> >>> wrote: >>> > Hi Folks, >>> > >>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test >>> data >>> > with replication factor 3 (around 400MB data). However when I run >>> wordcount >>> > example , it hangs at map 0%. >>> > >>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount /test_data >>> > /out2 >>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to >>> process : >>> > 2 >>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job: >>> job_201109061248_0002 >>> > 11/09/06 13:07:29 INFO mapred.JobClient: map 0% reduce 0% >>> > >>> > TTs and DNs are running fine on my slaves . I see them running when I >>> run >>> > jps command. >>> > >>> > >>> > Can any one help me out on this? Any idea why this would happen? I am >>> > totally clueless as nothing shows up in logs too.! >>> > >>> > Thanks, >>> > jS >>> > >>> >>> >>> >>> -- >>> Harsh J >>> >> >> > -- Harsh J
