Todd,
I have attached the jstack <pid of namenode> output. Does it appear to be
stuck in SecureRandom as you noted as a possibility? I am not sure whether
this is indicated in the following output:
sh-4.1# jps
4038 JobTracker
4160 Jps
3917 DataNode
4121 TaskTracker
3844 NameNode
3992 SecondaryNameNode
sh-4.1# jstack 3844
2011-01-03 15:07:01
Full thread dump OpenJDK Zero VM (14.0-b16 interpreted mode):
"Attach Listener" daemon prio=10 tid=0x0021a870 nid=0x106e waiting on condition
[0x00000000]
java.lang.Thread.State: RUNNABLE
"3299...@qtp0-1" prio=10 tid=0x6ff2cee8 nid=0x1039 in Object.wait() [0x6f2fe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7dcb46a8> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
- locked <0x7dcb46a8> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
"15020...@qtp0-0" prio=10 tid=0x6ff2ddd8 nid=0x1038 in Object.wait()
[0x6f47e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7dcb4718> (a
org.mortbay.thread.QueuedThreadPool$PoolThread)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:565)
- locked <0x7dcb4718> (a org.mortbay.thread.QueuedThreadPool$PoolThread)
"org.apache.hadoop.hdfs.server.namenode.decommissionmanager$moni...@955cd5"
daemon prio=10 tid=0x6ff036f8 nid=0xffe waiting on condition [0x6f68e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
at java.lang.Thread.run(Thread.java:636)
"org.apache.hadoop.hdfs.server.namenode.fsnamesystem$replicationmoni...@25c828"
daemon prio=10 tid=0x6ff02230 nid=0xff9 waiting on condition [0x6f80e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2327)
at java.lang.Thread.run(Thread.java:636)
"org.apache.hadoop.hdfs.server.namenode.leasemanager$moni...@22ab57" daemon
prio=10 tid=0x6ff00e00 nid=0xff8 waiting on condition [0x6f98e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:350)
at java.lang.Thread.run(Thread.java:636)
"org.apache.hadoop.hdfs.server.namenode.fsnamesystem$heartbeatmoni...@b1074a"
daemon prio=10 tid=0x6ff009b0 nid=0xff7 waiting on condition [0x6fb0e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:2309)
at java.lang.Thread.run(Thread.java:636)
"org.apache.hadoop.hdfs.server.namenode.pendingreplicationblocks$pendingreplicationmoni...@165f738"
daemon prio=10 tid=0x001f66e8 nid=0xff6 waiting on condition [0x6fc9e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.PendingReplicationBlocks$PendingReplicationMonitor.run(PendingReplicationBlocks.java:186)
at java.lang.Thread.run(Thread.java:636)
"Low Memory Detector" daemon prio=10 tid=0x000c09a8 nid=0xf50 runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x000bf1b8 nid=0xf4f runnable
[0x00000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x000af298 nid=0xf48 in Object.wait()
[0x7063e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
- locked <0x7daf8b40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
"Reference Handler" daemon prio=10 tid=0x000aaa08 nid=0xf47 in Object.wait()
[0x707be000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x7daf8bc8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x7daf8bc8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x000583c8 nid=0xf3f runnable [0xb729d000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:236)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x70e59ae8> (a java.io.BufferedInputStream)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x70e59970> (a java.io.BufferedInputStream)
at
sun.security.provider.SeedGenerator$URLSeedGenerator.getSeedByte(SeedGenerator.java:469)
at
sun.security.provider.SeedGenerator.getSeedBytes(SeedGenerator.java:140)
at
sun.security.provider.SeedGenerator.generateSeed(SeedGenerator.java:135)
at
sun.security.provider.SecureRandom.engineGenerateSeed(SecureRandom.java:131)
at
sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:188)
- locked <0x70e592c8> (a sun.security.provider.SecureRandom)
at java.security.SecureRandom.nextBytes(SecureRandom.java:450)
- locked <0x70e59870> (a java.security.SecureRandom)
at java.security.SecureRandom.next(SecureRandom.java:472)
at java.util.Random.nextLong(Random.java:299)
at
org.mortbay.jetty.servlet.HashSessionIdManager.doStart(HashSessionIdManager.java:139)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x70e4d418> (a java.lang.Object)
at
org.mortbay.jetty.servlet.AbstractSessionManager.doStart(AbstractSessionManager.java:168)
at
org.mortbay.jetty.servlet.HashSessionManager.doStart(HashSessionManager.java:67)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x7dcb3a70> (a java.lang.Object)
at
org.mortbay.jetty.servlet.SessionHandler.doStart(SessionHandler.java:115)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x7dcb34c0> (a java.lang.Object)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at
org.mortbay.jetty.handler.ContextHandler.startContext(ContextHandler.java:537)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:136)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1234)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:460)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x7dcb3490> (a java.lang.Object)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x7dcb1038> (a java.lang.Object)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:222)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
- locked <0x7dc9c0f8> (a java.lang.Object)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:461)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:246)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:202)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
"VM Thread" prio=10 tid=0x000a7ce8 nid=0xf45 runnable
"VM Periodic Task Thread" prio=10 tid=0x000c25a8 nid=0xf51 waiting on condition
JNI global references: 69
sh-4.1#
On Jan 2, 2011, at 6:39 PM, Todd Lipcon wrote:
> Hi Jon,
>
> My guess is that your system's entropy pool runs dry. You can verify by
> grabbing a jstack <pid of namenode> output, and seeing if you're stuck in
> SecureRandom.
>
> See this old thread:
> http://www.mail-archive.com/[email protected]/msg02170.html
>
> <http://www.mail-archive.com/[email protected]/msg02170.html>
> -Todd
>
> On Sun, Jan 2, 2011 at 8:37 AM, Jon Lederman <[email protected]> wrote:
>
>> Hi,
>>
>> I followed the example precisely. It seems to me that the NameNode and
>> DataNode are not communicating. I noticed that the log file for my DataNode
>> appears suspiciously short. I believe it should try to connect to the
>> NameNode and report such progress. The log for the DataNode simply shows:
>>
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG: host = localhost/127.0.0.1
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.20.2
>> STARTUP_MSG: build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled by 'chrisdo' on F
>> ri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>>
>> Also, the log file for the NameNode indicates 0 racks and 0 DataNodes as
>> indicated in bold:
>>
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG: host = localhost/127.0.0.1
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.20.2
>> STARTUP_MSG: build =
>> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>> 911707; compiled by 'chrisdo' on F
>> ri Feb 19 08:07:34 UTC 2010
>> ************************************************************/
>> 2011-01-02 16:30:34,070 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=NameNode, port=900
>> 0
>> 2011-01-02 16:30:35,093 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>> localhost.localdomain/127.0.0.1:90
>> 00
>> 2011-01-02 16:30:35,171 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=NameNode, sessi
>> onId=null
>> 2011-01-02 16:30:35,196 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
>> NameNodeMeterics using
>> context object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-02 16:30:37,022 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root
>> 2011-01-02 16:30:37,029 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-01-02 16:30:37,032 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>> 2011-01-02 16:30:37,216 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetric
>> s using context object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-01-02 16:30:37,242 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2011-01-02 16:30:37,799 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files = 1
>> 2011-01-02 16:30:37,882 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Number of files under construction = 0
>> 2011-01-02 16:30:37,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 94 loaded in 0 seconds.
>> 2011-01-02 16:30:37,891 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Edits file /tmp/hadoop-root/dfs/name/current/edits of
>> size 4 edits # 0 loaded in 0 seconds.
>> 2011-01-02 16:30:37,956 INFO org.apache.hadoop.hdfs.server.common.Storage:
>> Image file of size 94 saved in 0 seconds.
>> 2011-01-02 16:30:38,104 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
>> FSImage in 1726 msecs
>> 2011-01-02 16:30:38,130 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of blocks
>> = 0
>> 2011-01-02 16:30:38,133 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid
>> blocks = 0
>> 2011-01-02 16:30:38,136 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> under-replicated blocks = 0
>> 2011-01-02 16:30:38,139 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> over-replicated blocks = 0
>> 2011-01-02 16:30:38,144 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> Leaving safe mode after 1 secs.
>> 2011-01-02 16:30:38,154 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> Network topology has 0 racks and 0 datanodes
>> 2011-01-02 16:30:38,159 INFO org.apache.hadoop.hdfs.StateChange: STATE*
>> UnderReplicatedBlocks has 0 blocks
>> 2011-01-02 16:30:41,009 INFO org.mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.
>> Slf4jLog
>> 2011-01-02 16:30:42,045 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() bef
>> ore open() is -1. Opening the listener on 50070
>> 2011-01-02 16:30:42,060 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 50070 webServer.getConnectors()
>> [0].getLocalPort() returned 50070
>> 2011-01-02 16:30:42,062 INFO org.apache.hadoop.http.HttpServer: Jetty bound
>> to port 50070
>> 2011-01-02 16:30:42,064 INFO org.mortbay.log: jetty-6.1.14
>>
>> What should I check to see whether there is communication? Why should the
>> network topology as reported by the Namenode indicate 0 racks and 0
>> Datanodes?
>>
>> Also, I am curious what should be in the masters and slaves files when
>> running in pseudo-distributed mode.
>>
>> It seems I need to have both files contain: localhost. Otherwise, the
>> DataNode and/or NameNode do not start.
>>
>> Any help would be greatly appreciated.
>>
>> Thanks.
>>
>> -Jon
>>
>> On Jan 2, 2011, at 3:46 AM, Black, Michael (IS) wrote:
>>
>>> Did you sert your config and format the namenode as per these
>> instructions?
>>>
>>> http://hadoop.apache.org/common/docs/current/single_node_setup.html
>>>
>>>
>>> Michael D. Black
>>> Senior Scientist
>>> Advanced Analytics Directorate
>>> Northrop Grumman Information Systems
>>>
>>>
>>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera