Null Pointer Exception while re-starting the Hadoop Cluster

Vishaal Jatav Wed, 18 May 2011 04:17:55 -0700

Hi.

We are using a cluster of 2 computers (1 namenode and 2 secondarynodes)to store a large number of text files in the HDFS. The process had beenrunning for atleast a couple of weeks when suddenly due to some powerfailure, the server got reset. So, in effect, the HDFS didn't stopcleanly. When I tried to restart the cluster, I got a Null PointerException, with the following stack trace (from the logs).

2011-05-18 06:57:39,313 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:Initializing RPC Metrics with hostName=NameNode, port=YYYYY2011-05-18 06:57:39,321 INFOorg.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:master/172.XXX.XXX.XXX:YYYYY2011-05-18 06:57:39,326 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:Initializing JVM Metrics with processName=NameNode, sessionId=null2011-05-18 06:57:39,329 INFOorg.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:Initializing NameNodeMeterics using contextobject:org.apache.hadoop.metrics.spi.NullContext2011-05-18 06:57:39,444 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=vishaal,vishaal2011-05-18 06:57:39,444 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup2011-05-18 06:57:39,444 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:isPermissionEnabled=true2011-05-18 06:57:39,459 INFOorg.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:Initializing FSNamesystemMetrics using contextobject:org.apache.hadoop.metrics.spi.NullContext2011-05-18 06:57:39,461 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: RegisteredFSNamesystemStatusMBean2011-05-18 06:57:39,521 INFOorg.apache.hadoop.hdfs.server.common.Storage: Number of files = 12011-05-18 06:57:39,531 INFOorg.apache.hadoop.hdfs.server.common.Storage: Number of files underconstruction = 02011-05-18 06:57:39,531 INFOorg.apache.hadoop.hdfs.server.common.Storage: Image file of size 97loaded in 0 seconds.2011-05-18 06:57:39,532 INFOorg.apache.hadoop.hdfs.server.common.Storage: Edits file/home/vishaal/hadoop-0.20.2/tmp/dfs/name/current/edits of size 0 edits #0 loaded in 0 seconds.2011-05-18 06:57:39,535 ERRORorg.apache.hadoop.hdfs.server.namenode.NameNode:java.lang.NullPointerExceptionatorg.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1320)atorg.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1309)atorg.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:776)atorg.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:997)atorg.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)atorg.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)atorg.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)atorg.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)atorg.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)atorg.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)atorg.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2011-05-18 06:57:39,537 INFOorg.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 172.XXX.XXX.XXX
************************************************************/

Though this was just an experiment to test the reliability of the HDFSstorage, I would love to get it running again. This is, of course,hoping that the data could be recovered (if it is corrupted). A coupleof more questions:


   * Is this a common problem? Is there any available patch? (Although
     I couldn't get after a lot of Googling).
   * If the servers are prone to power failures, is it a good choice to
     continue with HDFS for storage of data?
   * If this occurs, does it mean that all the data is corrupt? Does it
     mean not all but some data is corrupt? Can the corrupted data be
     recovered?

Would appreciate a prompt reply as this was an attempt to prove theconcept of using distributed file system to store large amount of textas opposed to a relational database. (I hope you understand that I am onthe line of fire).


Thanks in advance.
Vishaal Jatav.
(vishaal[dot]iitb04[at]gmail[dot]com)

Null Pointer Exception while re-starting the Hadoop Cluster

Reply via email to