What do your config files look like? Is your NN web UI available at http://localhost:50070/ ? It looks to me like your datanode is stuck trying to talk to the NN. Can you get a stack trace from the DN using jstack or kill -QUIT?
-Todd On Mon, Sep 14, 2009 at 8:22 AM, Vincenzo Gulisano < [email protected]> wrote: > Hi, > I've just repeated the experiment. this is what I get: > > NAMENODE > > 2009-09-14 17:08:25,304 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting NameNode > STARTUP_MSG: host = XXX/192.*.*.* > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2-dev > STARTUP_MSG: build = -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43 > CEST 2009 > ************************************************************/ > 2009-09-14 17:08:25,442 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=NameNode, port=8020 > 2009-09-14 17:08:25,448 INFO > org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: > XXX/192.*.*.*:8020 > 2009-09-14 17:08:25,450 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > 2009-09-14 17:08:25,453 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: > Initializing > NameNodeMeterics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2009-09-14 17:08:25,530 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > fsOwner=vincenzo,vincenzo > 2009-09-14 17:08:25,530 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup > 2009-09-14 17:08:25,530 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > isPermissionEnabled=true > 2009-09-14 17:08:25,540 INFO > org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: > Initializing FSNamesystemMetrics using context > object:org.apache.hadoop.metrics.spi.NullContext > 2009-09-14 17:08:25,542 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered > FSNamesystemStatusMBean > 2009-09-14 17:08:25,581 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files = 1 > 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage: > Number of files under construction = 0 > 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage: > Image file of size 98 loaded in 0 seconds. > 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /tmp/hadoop-vincenzo/dfs/name/current/edits of size 4 edits # 0 > loaded in 0 seconds. > 2009-09-14 17:08:25,590 INFO org.apache.hadoop.hdfs.server.common.Storage: > Image file of size 98 saved in 0 seconds. > 2009-09-14 17:08:25,602 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading > FSImage in 117 msecs > 2009-09-14 17:08:25,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks > = 0 > 2009-09-14 17:08:25,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid > blocks = 0 > 2009-09-14 17:08:25,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of > under-replicated blocks = 0 > 2009-09-14 17:08:25,603 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of > over-replicated blocks = 0 > 2009-09-14 17:08:25,603 INFO org.apache.hadoop.hdfs.StateChange: STATE* > Leaving safe mode after 0 secs. > 2009-09-14 17:08:25,604 INFO org.apache.hadoop.hdfs.StateChange: STATE* > Network topology has 0 racks and 0 datanodes > 2009-09-14 17:08:25,604 INFO org.apache.hadoop.hdfs.StateChange: STATE* > UnderReplicatedBlocks has 0 blocks > 2009-09-14 17:08:25,802 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2009-09-14 17:08:25,893 INFO org.apache.hadoop.http.HttpServer: Port > returned by webServer.getConnectors()[0].getLocalPort() before open() is > -1. > Opening the listener on 50070 > 2009-09-14 17:08:25,894 INFO org.apache.hadoop.http.HttpServer: > listener.getLocalPort() returned 50070 > webServer.getConnectors()[0].getLocalPort() returned 50070 > 2009-09-14 17:08:25,894 INFO org.apache.hadoop.http.HttpServer: Jetty bound > to port 50070 > 2009-09-14 17:08:25,894 INFO org.mortbay.log: jetty-6.1.14 > > DATANODE > > 2009-09-14 17:08:26,768 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = XXX/192.*.*.* > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2-dev > STARTUP_MSG: build = -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43 > CEST 2009 > ************************************************************/ > > JOBTRACKER > > 2009-09-14 17:08:28,721 INFO org.apache.hadoop.mapred.JobTracker: > STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting JobTracker > STARTUP_MSG: host = XXX/192.*.*.* > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2-dev > STARTUP_MSG: build = -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43 > CEST 2009 > ************************************************************/ > 2009-09-14 17:08:28,829 INFO org.apache.hadoop.mapred.JobTracker: Scheduler > configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, > limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) > 2009-09-14 17:08:28,875 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=JobTracker, port=8021 > 2009-09-14 17:08:28,948 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2009-09-14 17:08:29,114 INFO org.apache.hadoop.http.HttpServer: Port > returned by webServer.getConnectors()[0].getLocalPort() before open() is > -1. > Opening the listener on 50030 > 2009-09-14 17:08:29,116 INFO org.apache.hadoop.http.HttpServer: > listener.getLocalPort() returned 50030 > webServer.getConnectors()[0].getLocalPort() returned 50030 > 2009-09-14 17:08:29,116 INFO org.apache.hadoop.http.HttpServer: Jetty bound > to port 50030 > 2009-09-14 17:08:29,116 INFO org.mortbay.log: jetty-6.1.14 > > TASKTRACER > > 2009-09-14 17:08:30,028 INFO org.apache.hadoop.mapred.TaskTracker: > STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting TaskTracker > STARTUP_MSG: host = XXX/192.*.*.* > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2-dev > STARTUP_MSG: build = -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43 > CEST 2009 > ************************************************************/ > 2009-09-14 17:08:30,240 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2009-09-14 17:08:30,399 INFO org.apache.hadoop.http.HttpServer: Port > returned by webServer.getConnectors()[0].getLocalPort() before open() is > -1. > Opening the listener on 50060 > 2009-09-14 17:08:30,407 INFO org.apache.hadoop.http.HttpServer: > listener.getLocalPort() returned 50060 > webServer.getConnectors()[0].getLocalPort() returned 50060 > 2009-09-14 17:08:30,407 INFO org.apache.hadoop.http.HttpServer: Jetty bound > to port 50060 > 2009-09-14 17:08:30,407 INFO org.mortbay.log: jetty-6.1.14 > > SECONDARY NAME NODE > > 2009-09-14 17:08:27,666 INFO > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting SecondaryNameNode > STARTUP_MSG: host = XXX/192.*.*.* > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.2-dev > STARTUP_MSG: build = -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43 > CEST 2009 > ************************************************************/ > 2009-09-14 17:08:27,738 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=SecondaryNameNode, sessionId=null > > ______________________ > > I've noticed now that the namenode says that 0 namenode are availabe, but > the configuration is the one suggested by the hadoop tutorial and all the > conf files (included masters and slaves) of my single-node cluster setup > are > specified with full name. > Do you have any idea about why this happens? > Again, thanks for your help > > > 2009/9/14 Todd Lipcon <[email protected]> > > > That's not an error - that just means that the daemon thread is waiting > for > > a connection (IO event) > > > > The logs in $HADOOP_HOME/log/ are entirely empty? Both the .log and .out > > files? I find that hard to believe :) > > > > -Todd > > > > On Mon, Sep 14, 2009 at 7:57 AM, Vincenzo Gulisano < > > [email protected]> wrote: > > > > > Hi Todd, > > > thanks for your answer. I've already tried this solution. No error is > > > reported. > > > As the program remains in a "wait state", no error is detected. > > > I've seen that the error > > > "sun.nio.ch.EPollArrayWrapper. epollWait (native method)" > > > affects other old bugs of hadoop, but I couldn't solve mine. > > > Thanks again > > > > > > > > > > > > > > > 2009/9/14 Todd Lipcon <[email protected]> > > > > > > > Hi Vincenzo, > > > > > > > > Look at the log output of your daemons. My guess is that you'll find > > > > something pretty clear there. > > > > > > > > -Todd > > > > > > > > On Mon, Sep 14, 2009 at 7:46 AM, Vincenzo Gulisano < > > > > [email protected]> wrote: > > > > > > > > > Hi, > > > > > after a lot of unsuccessful attempts of running hadoop distributed > > file > > > > > system on my machine, I've located one possible error. > > > > > Maybe you have some ideas about what's going on. > > > > > > > > > > Experiment: > > > > > What I'm doing is simply executing start-all.sh and hadoop dfsadmin > > > > -report > > > > > > > > > > After the setup I can check that everything is working using: > > > > > > > > > > jps > > > > > ... > > > > > 17421 NameNode > > > > > 17519 DataNode > > > > > 17611 SecondaryNameNode > > > > > 17685 JobTracker > > > > > 17778 TaskTracker > > > > > 18425 Jps > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > AND > > > > > > > > > > sudo netstat -plten | grep java > > > > > ... > > > > > tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN > > 1062 > > > > > 346907 17421/java (namenode) > > > > > tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN > > 1062 > > > > > 347480 17685/java (job tracker) > > > > > > > > > > > > > > > > > > > > > > > > > 2 things happen launching the application: > > > > > 1) The program waits and nothing happens (99% of the times) > > > > > 2) The program works but the report shows that the HDFS has some > > > problems > > > > > > > > > > Taking a look to the debug: > > > > > > > > > > > > > > > main: > > > > > > > > > > [1] java.lang.Object.wait (native method) > > > > > [2] java.lang.Object.wait (Object.java:485) > > > > > [3] org.apache.hadoop.ipc.Client.call (Client.java:725) > > > > > [4] org.apache.hadoop.ipc.RPC$Invoker.invoke (RPC.java:220) > > > > > [5] $Proxy0.getProtocolVersion (null) > > > > > [6] org.apache.hadoop.ipc.RPC.getProxy (RPC.java:359) > > > > > [7] org.apache.hadoop.hdfs.DFSClient.createRPCNamenode > > > > > (DFSClient.java:105) > > > > > [8] org.apache.hadoop.hdfs.DFSClient.<init> (DFSClient.java:208) > > > > > [9] org.apache.hadoop.hdfs.DFSClient.<init> (DFSClient.java:169) > > > > > [10] org.apache.hadoop.hdfs.DistributedFileSystem.initialize > > > > > (DistributedFileSystem.java:82) > > > > > [11] org.apache.hadoop.fs.FileSystem.createFileSystem > > > > > (FileSystem.java:1,384) > > > > > [12] org.apache.hadoop.fs.FileSystem.access$200 > (FileSystem.java:66) > > > > > [13] org.apache.hadoop.fs.FileSystem$Cache.get > > (FileSystem.java:1,399) > > > > > [14] org.apache.hadoop.fs.FileSystem.get (FileSystem.java:199) > > > > > [15] org.apache.hadoop.fs.FileSystem.get (FileSystem.java:96) > > > > > [16] org.apache.hadoop.fs.FsShell.init (FsShell.java:85) > > > > > [17] org.apache.hadoop.hdfs.tools.DFSAdmin.run (DFSAdmin.java:777) > > > > > [18] org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:65) > > > > > [19] org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:79) > > > > > [20] org.apache.hadoop.hdfs.tools.DFSAdmin.main > (DFSAdmin.java:858) > > > > > > > > > > IPC Client (47) connection to localhost/127.0.0.1:8020 from > > vincenzo: > > > > > [1] sun.nio.ch.EPollArrayWrapper.epollWait (native method) > > > > > [2] sun.nio.ch.EPollArrayWrapper.poll (EPollArrayWrapper.java:215) > > > > > [3] sun.nio.ch.EPollSelectorImpl.doSelect > > (EPollSelectorImpl.java:65) > > > > > [4] sun.nio.ch.SelectorImpl.lockAndDoSelect (SelectorImpl.java:69) > > > > > [5] sun.nio.ch.SelectorImpl.select (SelectorImpl.java:80) > > > > > [6] org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select > > > > > (SocketIOWithTimeout.java:332) > > > > > [7] org.apache.hadoop.net.SocketIOWithTimeout.doIO > > > > > (SocketIOWithTimeout.java:157) > > > > > [8] org.apache.hadoop.net.SocketInputStream.read > > > > > (SocketInputStream.java:155) > > > > > [9] org.apache.hadoop.net.SocketInputStream.read > > > > > (SocketInputStream.java:128) > > > > > [10] java.io.FilterInputStream.read (FilterInputStream.java:116) > > > > > [11] org.apache.hadoop.ipc.Client$Connection$PingInputStream.read > > > > > (Client.java:276) > > > > > [12] java.io.BufferedInputStream.fill > (BufferedInputStream.java:218) > > > > > [13] java.io.BufferedInputStream.read > (BufferedInputStream.java:237) > > > > > [14] java.io.DataInputStream.readInt (DataInputStream.java:370) > > > > > [15] org.apache.hadoop.ipc.Client$Connection.receiveResponse > > > > > (Client.java:501) > > > > > [16] org.apache.hadoop.ipc.Client$Connection.run (Client.java:446) > > > > > > > > > > Have you any idea about why this can happen? > > > > > > > > > > I've tries also to telnet the host:port and it works. I've tried > all > > > > > possible addresses in the configuration (localhost / 127.0.0.1 / > name > > / > > > > > name.domain ). > > > > > > > > > > Any help is appreciated, > > > > > Thanks in advance > > > > > > > > > > Vincenzo > > > > > > > > > > > > > > > > > > > > > -- > > > Vincenzo Massimiliano Gulisano > > > PhD student - UPM - Distributed System Lab. > > > > > > > > > -- > Vincenzo Massimiliano Gulisano > PhD student - UPM - Distributed System Lab. >
