Hi Todd, Thanks very much. I think you are really right.
I had used the hadoop-0.20-append patchs that is mentioned here: http://github.com/lenn0x/Hadoop-Append After reading the patch:0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch> , I found that the file "src/hdfs/org/apache/hadoop/hdfs/DFSClient.java" in my cluster does not contain these lines: * this.maxBlockAcquireFailures = conf.getInt("dfs.client.max.block.acquire.failures", MAX_BLOCK_ACQUIRE_FAILURES); * It just looks like this: * this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);* So I changed the 0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch> , and the diff between the origin 0002-HDFS-278.patch<https://github.com/lenn0x/Hadoop-Append/blob/master/0002-HDFS-278.patch> and the new patch after my change is: *diff 0002-HDFS-278.patch ../hadoop-new/patch-origion/0002-HDFS-278.patch * *0a1,10* *> From 56463073cf051f1e11b4d3921542979e53daead4 Mon Sep 17 00:00:00 2001* *> From: Chris Goffinet <c...@chrisgoffinet.com>* *> Date: Mon, 20 Jul 2009 17:20:13 -0700* *> Subject: [PATCH 2/4] HDFS-278* *> * *> ---* *> src/hdfs/org/apache/hadoop/hdfs/DFSClient.java | 70 ++++++++++++++++++++++--* *> 1 files changed, 64 insertions(+), 6 deletions(-)* *> * *> diff --git a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java* *2,3c12,13* *< --- src/hdfs/org/apache/hadoop/hdfs/DFSClient.java* *< +++ src/hdfs/org/apache/hadoop/hdfs/DFSClient.java* *---* *> --- a/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java* *> +++ b/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java* *19,20c29,32* *< @@ -188,5 +192,7 @@ public class DFSClient implements FSConstants, java.io.Closeable {* *< this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);* *---* *> @@ -167,7 +171,9 @@ public class DFSClient implements FSConstants, java.io.Closeable {* *> this.maxBlockAcquireFailures = * *> conf.getInt("dfs.client.max.block.acquire.failures",* *> MAX_BLOCK_ACQUIRE_FAILURES);* *118a131,133* *> -- * *> 1.6.3.1* *> * Did I miss some of the patchs about hadoop-0.20-append? How could I recover my NN and let it work that I can export the data? 2011/2/14 Todd Lipcon <t...@cloudera.com> > Hi Jameson, > > My first instinct is that you have an incomplete patch series for hdfs > append, and that's what caused your problem. There were many bug fixes along > the way for hadoop-0.20-append and maybe you've missed some in your manually > patched build. > > -Todd > > > On Mon, Feb 14, 2011 at 5:49 AM, Jameson Li <hovlj...@gmail.com> wrote: > >> Hi , >> >> My hadoop version is basic on hadoop 0.20.2 realase, patched >> HADOOP-4675,5745,MAPREDUCE-1070,551,1089 (support >> ganglia31,fairscheduler preemption,hdfs append), and patched >> HADOOP-6099,HDFS-278,Patches-from-Dhruba-Borthakur,HDFS-200 (support >> scribe). >> >> Last Friday I found that some of my test hadoop cluster nodes's time >> is not in the normal state, they are some number of hours beyond the >> normal time. >> So I run the next command, and add it to the crontab job. >> /usr/bin/rdate -s time-b.nist.gov >> >> And then my hadoop cluster namenode crashed, after my restarting the >> namenode. >> And I don't know whether it is relationed by modifying the time. >> >> The error log: >> 2011-02-12 18:44:46,603 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Total number of >> blocks = 196 >> 2011-02-12 18:44:46,603 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of invalid >> blocks = 0 >> 2011-02-12 18:44:46,603 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of >> under-replicated blocks = 29 >> 2011-02-12 18:44:46,603 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of >> over-replicated blocks = 41 >> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: >> STATE* Leaving safe mode after 69 secs. >> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: >> STATE* Safe mode is OFF. >> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: >> STATE* Network topology has 1 racks and 5 datanodes >> 2011-02-12 18:44:46,603 INFO org.apache.hadoop.hdfs.StateChange: >> STATE* UnderReplicatedBlocks has 29 blocks >> 2011-02-12 18:44:46,886 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.14:50010 to replicate >> blk_-8806907658071633346_1750 to datanode(s) 192.168.1.83:50010 >> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.83:50010 to replicate >> blk_-7689075547598626554_1800 to datanode(s) 192.168.1.10:50010 >> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.84:50010 to replicate >> blk_-7587424527299099175_1717 to datanode(s) 192.168.1.10:50010 >> 2011-02-12 18:44:46,887 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.84:50010 to replicate >> blk_-6925943363757944243_1909 to datanode(s) 192.168.1.13:50010 >> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.14:50010 to replicate >> blk_-6835423500788375545_1928 to datanode(s) 192.168.1.10:50010 >> 2011-02-12 18:44:46,888 INFO org.apache.hadoop.hdfs.StateChange: >> BLOCK* ask 192.168.1.83:50010 to replicate >> blk_-6477488774631498652_1742 to datanode(s) 192.168.1.84:50010 >> 2011-02-12 18:44:46,889 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >> ReplicationMonitor thread received Runtime exception. >> java.lang.IllegalStateException: generationStamp (=1) == >> GenerationStamp.WILDCARD_STAMP java.lang.IllegalStateException: >> generationStamp (=1) == GenerationStamp.WILDCARD_STAMP >> at >> org.apache.hadoop.hdfs.protocol.Block.validateGenerationStamp(Block.java:148) >> at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:156) >> at org.apache.hadoop.hdfs.protocol.Block.compareTo(Block.java:30) >> at java.util.TreeMap.put(TreeMap.java:545) >> at java.util.TreeSet.add(TreeSet.java:238) >> at >> org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor.addBlocksToBeInvalidated(DatanodeDescriptor.java:284) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.invalidateWorkForOneNode(FSNamesystem.java:2743) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeInvalidateWork(FSNamesystem.java:2419) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.computeDatanodeWork(FSNamesystem.java:2412) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:2357) >> at java.lang.Thread.run(Thread.java:619) >> 2011-02-12 18:44:46,892 INFO >> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down NameNode at hadoop5/192.168.1.84 >> ************************************************************/ >> >> >> Thanks, >> Jameson >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >