Yep , it works .. I just synced /etc/hosts files and I didnt change other configs and now its working fine. Thanks for the help Harsh. Sorry for spamming without checking my TTlogs properly.
Also 1 more doubt . Any idea why its scheduling only a single reduce? I have 2 datanodes and I am expecting it to run 2 reducers (data size of 500MB) . Any hints? On Tue, Sep 6, 2011 at 3:17 PM, Harsh J <[email protected]> wrote: > John, > > Yes, looks like your slave nodes aren't able to properly resolve some > hostnames. Hadoop requires a sane network setup to work properly. > Also, yes, you need to use a hostname for your fs.default.name and > other configs to the extent possible. > > The easiest way is to keep a properly synchronized /etc/hosts file. > > For example, it may look like so, on all machines: > > 127.0.0.1 localhost.localdomain localhost > 192.168.0.1 master.hadoop master > 192.168.0.2 slave3.hadoop slave3 > (and so on…) > > (This way master can resolve slaves, and slaves can resolve master. If > you have the time, setup a DNS, its the best thing to do.) > > Then, in core-site.xml you'll need: > > fs.default.name = hdfs://master > > And in mapred-site.xml: > > mapred.job.tracker = master:8021 > > That should do it, so long as the slave hosts can freely access the > master hosts (no blockage of ports via firewall and such). > > On Tue, Sep 6, 2011 at 3:05 PM, john smith <[email protected]> wrote: > > Hey My TT logs show this , > > > > 2011-09-06 13:22:41,860 ERROR org.apache.hadoop.mapred.TaskTracker: > Caught > > exception: java.net.UnknownHostException: unknown host: rip-pc.local > > at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195) > > at org.apache.hadoop.ipc.Client.getConnection(Client.java:853) > > at org.apache.hadoop.ipc.Client.call(Client.java:723) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > > at $Proxy5.getProtocolVersion(Unknown Source) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) > > at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170) > > at > > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) > > ^C at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > > > > > > May be some error in configs ?? I am using IPs in the conf files ..should > I > > put entries in /etc/hosts files? > > > > On Tue, Sep 6, 2011 at 3:00 PM, john smith <[email protected]> > wrote: > > > >> Hi Harsh, > >> > >> My jt log : http://pastebin.com/rXAEeDkC > >> > >> I have some startup exceptions (which doesn't matter much I guess) but > the > >> tail indicates that its locating the splits correctly and then it hangs > ! > >> > >> Any idea? > >> > >> Thanks > >> > >> > >> On Tue, Sep 6, 2011 at 1:30 PM, Harsh J <[email protected]> wrote: > >> > >>> I'd check the tail of JobTracker logs after a submit is done to see if > >>> an error/warn there is causing this. And then dig further on > >>> why/what/how. > >>> > >>> Hard to tell what your problem specifically is without logs :) > >>> > >>> On Tue, Sep 6, 2011 at 1:18 PM, john smith <[email protected]> > >>> wrote: > >>> > Hi Folks, > >>> > > >>> > I am working on a 3 node cluster (1 NN + 2 DNs) . I loaded some test > >>> data > >>> > with replication factor 3 (around 400MB data). However when I run > >>> wordcount > >>> > example , it hangs at map 0%. > >>> > > >>> > bin/hadoop jar hadoop-examples-0.20.3-SNAPSHOT.jar wordcount > /test_data > >>> > /out2 > >>> > 11/09/06 13:07:28 INFO input.FileInputFormat: Total input paths to > >>> process : > >>> > 2 > >>> > 11/09/06 13:07:28 INFO mapred.JobClient: Running job: > >>> job_201109061248_0002 > >>> > 11/09/06 13:07:29 INFO mapred.JobClient: map 0% reduce 0% > >>> > > >>> > TTs and DNs are running fine on my slaves . I see them running when I > >>> run > >>> > jps command. > >>> > > >>> > > >>> > Can any one help me out on this? Any idea why this would happen? I am > >>> > totally clueless as nothing shows up in logs too.! > >>> > > >>> > Thanks, > >>> > jS > >>> > > >>> > >>> > >>> > >>> -- > >>> Harsh J > >>> > >> > >> > > > > > > -- > Harsh J >
