Hello everyone-
So I have a 5 node cluster that I've been running for a few weeks with no
problems. Today I decided to add nodes and double its size to 10. After doing
all the setup and starting the cluster, I discovered that four out of the 10
nodes had failed to startup. Specifically, the data nodes didn't start. The
task trackers seemed to start fine. Thinking it was something I did
incorrectly with the expansion, I then reverted back to the 5 node
configuration but I'm experiencing the same problem...with only 2 of 5 nodes
starting correctly. Here is what I'm seeing in the hadoop-*-datanode*.log
files:
2009-04-07 12:35:40,628 INFO org.apache.hadoop.dfs.DataNode: Starting Periodic
block scanner.
2009-04-07 12:35:45,548 INFO org.apache.hadoop.dfs.DataNode: BlockReport of
9269 blocks got processed in 1128 msecs
2009-04-07 12:35:45,584 ERROR org.apache.hadoop.dfs.DataNode:
DatanodeRegistration(10.254.165.223:50010, storageID=DS-202528624-10.254.13
1.244-50010-1238604807366, infoPort=50075, ipcPort=50020):DataXceiveServer:
Exiting due to:java.nio.channels.ClosedSelectorException
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:66)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
at sun.nio.ch.Util.releaseTemporarySelector(Util.java:135)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:120)
at
org.apache.hadoop.dfs.DataNode$DataXceiveServer.run(DataNode.java:997)
at java.lang.Thread.run(Thread.java:619)
After this the data node shuts down. This same message is appearing on all the
failed nodes. Help!
-kevin