I had also assumed that some other jar or configuration file had been changed, but reviewing the timestamps on the files did not reveal the problem. On the assumption that something did in fact change, that I was not seeing, I renamed my $HADOOP_HOME directory and replaced it with one from a slave. I then restored $HADOOP_HOME/conf from the original/renamed directory, and voila - we're back in business.
Brian, thanks very much for your help. It took literally more time for me to write the original email (5 minutes) than to get a reply which indicated a way to solve the problem, and another 5 minutes to solve it. That says a lot about the user group. I don't think I would have reached a human being in 5 minutes for the tech support for most products. I'll make sure to monitor this list more closely so I can pay it forward ;) Thanks, -Phil On Mon, Oct 25, 2010 at 7:16 PM, Brian Bockelman <[email protected]>wrote: > Hi Phil, > > Typically, this is due to running inconsistent versions of the Hadoop, > right? > > I would compare the output of 'md5sum' on the NN versus the DN for the > various Hadoop JARs. > > If you do "jar tf" on the jar you added to $HADOOP_HOME/lib, did it > inadvertently add another implementation of the NN classes? > > Brian > > On Oct 25, 2010, at 6:12 PM, phil young wrote: > > > Wow. I could use help quickly... > > > > My name node is reporting a null BV. All the data nodes report the same > > Build Version. > > We were not upgrading the DFS, but did stop, restart, after adding a jar > to > > $HADOOP_HOME/lib. > > So, we think we understand the cause. > > > > Web searching shows a number of people have had this issue, but I don't > see > > a response to the plea here for advice on repairing the problem: > > http://old.nabble.com/namenode-failure-td20199395.html > > > > > > -- Here's a log from a DataNode > > /************************************************************ > > STARTUP_MSG: Starting DataNode > > STARTUP_MSG: host = hdp12n.tripadvisor.com/192.168.33.231 > > STARTUP_MSG: args = [] > > STARTUP_MSG: version = 0.20.2 > > STARTUP_MSG: build = > > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r > > 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010 > > ************************************************************/ > > 2010-10-25 18:23:07,081 FATAL > > org.apache.hadoop.hdfs.server.datanode.DataNode: Incompatible build > > versions: namenode BV = ; datanode BV = 911707 > > 2010-10-25 18:23:07,186 ERROR > > org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: > > Incompatible build versions: namenode BV = ; datanode BV = 911707 > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.handshake(DataNode.java:436) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:275) > > at > org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238) > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246) > > at > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) > > > > 2010-10-25 18:23:07,187 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down DataNode at > > hdp12n.tripadvisor.com/192.168.33.231 > > > > -- This is from the NameNode > > > > 2010-10-25 18:38:58,760 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > > ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/192.168.33.230 > > cmd=listStatus src=/disk1/hadoop-root/mapred/system dst=null perm=null > > 2010-10-25 18:38:58,764 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 9 on 54310, call delete(/disk1/hadoop-root/mapred/system, true) > from > > 192.168.33.230:44574: error: > > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete > > /disk1/hadoop-root/mapred/system. Name node is in safe mode. > > The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. > > Safe mode will be turned off automatically. > > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete > > /disk1/hadoop-root/mapred/system. Name node is in safe mode. > > The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. > > Safe mode will be turned off automatically. > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1700) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1680) > > at > org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517) > > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > 2010-10-25 18:39:08,770 INFO > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > > ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/192.168.33.230 > > cmd=listStatus src=/disk1/hadoop-root/mapred/system dst=null perm=null > > 2010-10-25 18:39:08,774 INFO org.apache.hadoop.ipc.Server: IPC Server > > handler 0 on 54310, call delete(/disk1/hadoop-root/mapred/system, true) > from > > 192.168.33.230:44574: error: > > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete > > /disk1/hadoop-root/mapred/system. Name node is in safe mode. > > The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. > > Safe mode will be turned off automatically. > > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete > > /disk1/hadoop-root/mapred/system. Name node is in safe mode. > > The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. > > Safe mode will be turned off automatically. > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1700) > > at > > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1680) > > at > org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517) > > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > > 2010-10-25 18:39:09,504 INFO > > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down NameNode at > > hdp11an.tripadvisor.com/192.168.33.228 > > ************************************************************/ > > > > [r...@hdp11an current]# ls -l /disk1/hadoop-root/ > > total 4 > > drwxr-xr-x 4 root root 4096 Oct 22 10:11 hadoop-unjar1448904914513586870 > >
