Hi All, Thank you so much for your valuable solutions!
*Problem got resolved, but significant time+data loss*(since we were running on an experimental basis, reloaded fewer GB of the data). I used -importCheckpoint option. I just would like to tell you the possible scenario/reason of editlog corruption might have happened(correct me if I am wrong), Below were the typical configurations in hdfs-site.xml - hadoop.tmp.dir : */opt/data*/tmp - dfs.name.dir : */opt/data*/name - dfs.data.dir : */opt/data*/data - mapred.local.dir : ${hadoop.tmp.dir}/mapred/local */opt/data *is an mounted storage, size is 50GB. Namenode, SecondaryNamenode( ${hadoop.tmp.dir}/dfs/namesecondary) & Datanode directories were configured within */opt/data *itself. Once I moved 3.6GB compressed(bz2) file, I guess */opt/data *memory usage of this dir. could have been 100%(I checked($df -h) after this incident). Then, I ran Hive with simple* "Select" *query, its job.jar files also needs to be created within the same directory which already has no space. So this is how the editlog corruption could have been occurred. This is really a good learning for me! Now I have changed that configurations. Thanks again, Sakthivel On Fri, Jul 15, 2011 at 4:47 PM, Brahma Reddy <brahmared...@huawei.com>wrote: > Hi,**** > > ** ** > > **1) **This can be achieved either by copying the relevant storage > directory to a new name node ,**** > > **2) **or, if the secondary is taking over as the new name node > daemon .by using the –import checkpoint option when starting the name node > daemon. The –importcheckopoint option will load the name node metadata from > the latest checkpoint in the directory defined by the > *fs.chekpoint.dir*property, but only if there is no metadata in the > dfs.name.dir,so there is > no risk of overwriting precious data**** > > **** > > Regards**** > > Brahma Reddy**** > > ** ** > > > *************************************************************************************** > This e-mail and attachments contain confidential information from HUAWEI, > which is intended only for the person or entity whose address is listed > above. Any use of the information contained herein in any way (including, > but not limited to, total or partial disclosure, reproduction, or > dissemination) by persons other than the intended recipient's) is > prohibited. If you receive this e-mail in error, please notify the sender by > phone or email immediately and delete it!**** > ------------------------------ > > *From:* Sakthivel Murugasamy [mailto:sakthiinfo...@gmail.com] > *Sent:* Friday, July 15, 2011 2:40 PM > > *To:* hdfs-user@hadoop.apache.org > *Subject:* Re: Namenode not get started. Reason: FSNamesystem > initialization failed.**** > > ** ** > > Dear Team, > > I have loaded 3.6GB of compressed(bz2) directly into Hive, after that I ran > a simple "*select query*", namenode got crashed. > There after not able to start namenode. > > Envoronment: **** > > - CentOS release 5.5 (Final), Hadoop Version: 0.20.2**** > - Cluster size: 18 node **** > - NameNode & SecondaryNamenode are in the same Machine**** > > It seems editlogs/fsimage got corrupted, I haven't take any backup > separately, below is the exception > > 2011-07-14 23:37:43,378 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > java.io.FileNotFoundException: File does not exist: > /opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@ > > *Please find detailed exception in namenode's log file attached.* > > Earlier, I have also posted in JIRA, > https://issues.apache.org/jira/browse/HADOOP-7458 , Jakob Homan directed > me to post in hdfs user's list. > > Will there be any backup in SecondaryNamenode? Could you please assist me > to recover Namenode from this issue? > > > Thanks, > Sakthivel **** >