What do your config files look like? Is your NN web UI available at
http://localhost:50070/ ? It looks to me like your datanode is stuck trying
to talk to the NN. Can you get a stack trace from the DN using jstack or
kill -QUIT?

-Todd

On Mon, Sep 14, 2009 at 8:22 AM, Vincenzo Gulisano <
[email protected]> wrote:

> Hi,
> I've just repeated the experiment. this is what I get:
>
> NAMENODE
>
> 2009-09-14 17:08:25,304 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = XXX/192.*.*.*
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2-dev
> STARTUP_MSG:   build =  -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43
> CEST 2009
> ************************************************************/
> 2009-09-14 17:08:25,442 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=8020
> 2009-09-14 17:08:25,448 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
> XXX/192.*.*.*:8020
> 2009-09-14 17:08:25,450 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2009-09-14 17:08:25,453 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
> Initializing
> NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2009-09-14 17:08:25,530 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> fsOwner=vincenzo,vincenzo
> 2009-09-14 17:08:25,530 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2009-09-14 17:08:25,530 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2009-09-14 17:08:25,540 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2009-09-14 17:08:25,542 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 2009-09-14 17:08:25,581 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 1
> 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 0
> 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 98 loaded in 0 seconds.
> 2009-09-14 17:08:25,586 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /tmp/hadoop-vincenzo/dfs/name/current/edits of size 4 edits # 0
> loaded in 0 seconds.
> 2009-09-14 17:08:25,590 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 98 saved in 0 seconds.
> 2009-09-14 17:08:25,602 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> FSImage in 117 msecs
> 2009-09-14 17:08:25,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
> = 0
> 2009-09-14 17:08:25,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
> blocks = 0
> 2009-09-14 17:08:25,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> under-replicated blocks = 0
> 2009-09-14 17:08:25,603 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> over-replicated blocks = 0
> 2009-09-14 17:08:25,603 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Leaving safe mode after 0 secs.
> 2009-09-14 17:08:25,604 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2009-09-14 17:08:25,604 INFO org.apache.hadoop.hdfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2009-09-14 17:08:25,802 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2009-09-14 17:08:25,893 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> Opening the listener on 50070
> 2009-09-14 17:08:25,894 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50070
> webServer.getConnectors()[0].getLocalPort() returned 50070
> 2009-09-14 17:08:25,894 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50070
> 2009-09-14 17:08:25,894 INFO org.mortbay.log: jetty-6.1.14
>
> DATANODE
>
> 2009-09-14 17:08:26,768 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG:   host = XXX/192.*.*.*
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2-dev
> STARTUP_MSG:   build =  -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43
> CEST 2009
> ************************************************************/
>
> JOBTRACKER
>
> 2009-09-14 17:08:28,721 INFO org.apache.hadoop.mapred.JobTracker:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting JobTracker
> STARTUP_MSG:   host = XXX/192.*.*.*
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2-dev
> STARTUP_MSG:   build =  -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43
> CEST 2009
> ************************************************************/
> 2009-09-14 17:08:28,829 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
> 2009-09-14 17:08:28,875 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=JobTracker, port=8021
> 2009-09-14 17:08:28,948 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2009-09-14 17:08:29,114 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> Opening the listener on 50030
> 2009-09-14 17:08:29,116 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50030
> webServer.getConnectors()[0].getLocalPort() returned 50030
> 2009-09-14 17:08:29,116 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50030
> 2009-09-14 17:08:29,116 INFO org.mortbay.log: jetty-6.1.14
>
> TASKTRACER
>
> 2009-09-14 17:08:30,028 INFO org.apache.hadoop.mapred.TaskTracker:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting TaskTracker
> STARTUP_MSG:   host = XXX/192.*.*.*
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2-dev
> STARTUP_MSG:   build =  -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43
> CEST 2009
> ************************************************************/
> 2009-09-14 17:08:30,240 INFO org.mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 2009-09-14 17:08:30,399 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open() is
> -1.
> Opening the listener on 50060
> 2009-09-14 17:08:30,407 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50060
> webServer.getConnectors()[0].getLocalPort() returned 50060
> 2009-09-14 17:08:30,407 INFO org.apache.hadoop.http.HttpServer: Jetty bound
> to port 50060
> 2009-09-14 17:08:30,407 INFO org.mortbay.log: jetty-6.1.14
>
> SECONDARY NAME NODE
>
> 2009-09-14 17:08:27,666 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting SecondaryNameNode
> STARTUP_MSG:   host = XXX/192.*.*.*
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2-dev
> STARTUP_MSG:   build =  -r ; compiled by 'vincenzo' on Mon Sep 14 15:49:43
> CEST 2009
> ************************************************************/
> 2009-09-14 17:08:27,738 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=SecondaryNameNode, sessionId=null
>
> ______________________
>
> I've noticed now that the namenode says that 0 namenode are availabe, but
> the configuration is the one suggested by the hadoop tutorial and all the
> conf files (included masters and slaves) of my single-node cluster setup
> are
> specified with full name.
> Do you have any idea about why this happens?
> Again, thanks for your help
>
>
> 2009/9/14 Todd Lipcon <[email protected]>
>
> > That's not an error - that just means that the daemon thread is waiting
> for
> > a connection (IO event)
> >
> > The logs in $HADOOP_HOME/log/ are entirely empty? Both the .log and .out
> > files? I find that hard to believe :)
> >
> > -Todd
> >
> > On Mon, Sep 14, 2009 at 7:57 AM, Vincenzo Gulisano <
> > [email protected]> wrote:
> >
> > > Hi Todd,
> > > thanks for your answer. I've already tried this solution. No error is
> > > reported.
> > > As the program remains in a "wait state", no error is detected.
> > > I've seen that the error
> > > "sun.nio.ch.EPollArrayWrapper. epollWait (native method)"
> > > affects other old bugs of hadoop, but I couldn't solve mine.
> > > Thanks again
> > >
> > >
> > >
> > >
> > > 2009/9/14 Todd Lipcon <[email protected]>
> > >
> > > > Hi Vincenzo,
> > > >
> > > > Look at the log output of your daemons. My guess is that you'll find
> > > > something pretty clear there.
> > > >
> > > > -Todd
> > > >
> > > > On Mon, Sep 14, 2009 at 7:46 AM, Vincenzo Gulisano <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi,
> > > > > after a lot of unsuccessful attempts of running hadoop distributed
> > file
> > > > > system on my machine, I've located one possible error.
> > > > > Maybe you have some ideas about what's going on.
> > > > >
> > > > > Experiment:
> > > > > What I'm doing is simply executing start-all.sh and hadoop dfsadmin
> > > > -report
> > > > >
> > > > > After the setup I can check that everything is working using:
> > > > >
> > > > > jps
> > > > > ...
> > > > > 17421 NameNode
> > > > > 17519 DataNode
> > > > > 17611 SecondaryNameNode
> > > > > 17685 JobTracker
> > > > > 17778 TaskTracker
> > > > > 18425 Jps
> > > > > ...
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > AND
> > > > >
> > > > > sudo netstat -plten | grep java
> > > > > ...
> > > > > tcp        0      0 127.0.0.1:54310         0.0.0.0:* LISTEN
> >  1062
> > > > >      346907      17421/java      (namenode)
> > > > > tcp        0      0 127.0.0.1:54311         0.0.0.0:* LISTEN
> >  1062
> > > > >      347480      17685/java      (job tracker)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2 things happen launching the application:
> > > > > 1) The program waits and nothing happens (99% of the times)
> > > > > 2) The program works but the report shows that the HDFS has some
> > > problems
> > > > >
> > > > > Taking a look to the debug:
> > > > >
> > > > >
> > > > > main:
> > > > >
> > > > >  [1] java.lang.Object.wait (native method)
> > > > >  [2] java.lang.Object.wait (Object.java:485)
> > > > >  [3] org.apache.hadoop.ipc.Client.call (Client.java:725)
> > > > >  [4] org.apache.hadoop.ipc.RPC$Invoker.invoke (RPC.java:220)
> > > > >  [5] $Proxy0.getProtocolVersion (null)
> > > > >  [6] org.apache.hadoop.ipc.RPC.getProxy (RPC.java:359)
> > > > >  [7] org.apache.hadoop.hdfs.DFSClient.createRPCNamenode
> > > > > (DFSClient.java:105)
> > > > >  [8] org.apache.hadoop.hdfs.DFSClient.<init> (DFSClient.java:208)
> > > > >  [9] org.apache.hadoop.hdfs.DFSClient.<init> (DFSClient.java:169)
> > > > >  [10] org.apache.hadoop.hdfs.DistributedFileSystem.initialize
> > > > > (DistributedFileSystem.java:82)
> > > > >  [11] org.apache.hadoop.fs.FileSystem.createFileSystem
> > > > > (FileSystem.java:1,384)
> > > > >  [12] org.apache.hadoop.fs.FileSystem.access$200
> (FileSystem.java:66)
> > > > >  [13] org.apache.hadoop.fs.FileSystem$Cache.get
> > (FileSystem.java:1,399)
> > > > >  [14] org.apache.hadoop.fs.FileSystem.get (FileSystem.java:199)
> > > > >  [15] org.apache.hadoop.fs.FileSystem.get (FileSystem.java:96)
> > > > >  [16] org.apache.hadoop.fs.FsShell.init (FsShell.java:85)
> > > > >  [17] org.apache.hadoop.hdfs.tools.DFSAdmin.run (DFSAdmin.java:777)
> > > > >  [18] org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:65)
> > > > >  [19] org.apache.hadoop.util.ToolRunner.run (ToolRunner.java:79)
> > > > >  [20] org.apache.hadoop.hdfs.tools.DFSAdmin.main
> (DFSAdmin.java:858)
> > > > >
> > > > > IPC Client (47) connection to localhost/127.0.0.1:8020 from
> > vincenzo:
> > > > >  [1] sun.nio.ch.EPollArrayWrapper.epollWait (native method)
> > > > >  [2] sun.nio.ch.EPollArrayWrapper.poll (EPollArrayWrapper.java:215)
> > > > >  [3] sun.nio.ch.EPollSelectorImpl.doSelect
> > (EPollSelectorImpl.java:65)
> > > > >  [4] sun.nio.ch.SelectorImpl.lockAndDoSelect (SelectorImpl.java:69)
> > > > >  [5] sun.nio.ch.SelectorImpl.select (SelectorImpl.java:80)
> > > > >  [6] org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select
> > > > > (SocketIOWithTimeout.java:332)
> > > > >  [7] org.apache.hadoop.net.SocketIOWithTimeout.doIO
> > > > > (SocketIOWithTimeout.java:157)
> > > > >  [8] org.apache.hadoop.net.SocketInputStream.read
> > > > > (SocketInputStream.java:155)
> > > > >  [9] org.apache.hadoop.net.SocketInputStream.read
> > > > > (SocketInputStream.java:128)
> > > > >  [10] java.io.FilterInputStream.read (FilterInputStream.java:116)
> > > > >  [11] org.apache.hadoop.ipc.Client$Connection$PingInputStream.read
> > > > > (Client.java:276)
> > > > >  [12] java.io.BufferedInputStream.fill
> (BufferedInputStream.java:218)
> > > > >  [13] java.io.BufferedInputStream.read
> (BufferedInputStream.java:237)
> > > > >  [14] java.io.DataInputStream.readInt (DataInputStream.java:370)
> > > > >  [15] org.apache.hadoop.ipc.Client$Connection.receiveResponse
> > > > > (Client.java:501)
> > > > >  [16] org.apache.hadoop.ipc.Client$Connection.run (Client.java:446)
> > > > >
> > > > > Have you any idea about why this can happen?
> > > > >
> > > > > I've tries also to telnet the host:port and it works. I've tried
> all
> > > > > possible addresses in the configuration (localhost / 127.0.0.1 /
> name
> > /
> > > > > name.domain ).
> > > > >
> > > > > Any help is appreciated,
> > > > > Thanks in advance
> > > > >
> > > > > Vincenzo
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Vincenzo Massimiliano Gulisano
> > > PhD student - UPM - Distributed System Lab.
> > >
> >
>
>
>
> --
> Vincenzo Massimiliano Gulisano
> PhD student - UPM - Distributed System Lab.
>

Reply via email to