I had a secondary namenode running on the namenode machine.
I deleted the dfs.name.dir
then bin/hadoop namenode -importCheckpoint.
and restarted the dfs.
I guess the deletion of name.dir will delete the edit logs.
Can u pl tell me that this will not lead to replaying the delete
transactions ?
Thanks for help/advice
-Sagar
lohit wrote:
NameNode would not come out of safe mode as it is still waiting for datanodes to report those blocks which it expects.
I should have added, try to get a full output of fsck
fsck <path> -openforwrite -files -blocks -location.
-openforwrite files should tell you what files where open during the
checkpoint, you might want to double check that is the case, the files were
being writting during that moment. May be by looking at the filename you could
tell if that was part of a job which was running.
For any missing block, you might also want to cross verify on the datanode to
see if is really missing.
Once you are convinced that those are the only corrupt files which you can live with, start datanodes.
Namenode woudl still not come out of safemode as you have missing blocks, leave it for a while, run fsck look around, if everything ok, bring namenode out of safemode.
I hope you had started this namenode with old image and empty edits. You do not
want your latest edits to be replayed, which has your delete transactions.
Thanks,
Lohit
----- Original Message ----
From: Sagar Naik <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, November 14, 2008 12:11:46 PM
Subject: Re: Recovery of files in hadoop 18
Hey Lohit,
Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.
But for some reason, the namenode is still in safe_mode. It has been an hour or
so.
The fsck report is :
Total size: 6954466496842 B (Total open files size: 543469222 B)
Total dirs: 1159
Total files: 1354155 (Files currently being written: 7673)
Total blocks (validated): 1375725 (avg. block size 5055128 B) (Total open
file blocks (not validated): 50)
********************************
CORRUPT FILES: 1574
MISSING BLOCKS: 1574
MISSING SIZE: 1165735334 B
CORRUPT BLOCKS: 1574
********************************
Minimally replicated blocks: 1374151 (99.88559 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 26619 (1.9349071 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.977127
Corrupt blocks: 1574
Missing replicas: 26752 (0.65317154 %)
Do you think, I should manually override the safemode and delete all the
corrupted files and restart
-Sagar
lohit wrote:
If you have enabled thrash. They should be moved to trash folder before
permanently deleting them, restore them back. (hope you have that set
fs.trash.interval)
If not Shut down the cluster.
Take backup of you dfs.data.dir (both on namenode and secondary namenode).
Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint.
Hope that helps,
Lohit
----- Original Message ----
From: Sagar Naik <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, November 14, 2008 10:38:45 AM
Subject: Recovery of files in hadoop 18
Hi,
I accidentally deleted the root folder in our hdfs.
I have stopped the hdfs
Is there any way to recover the files from secondary namenode
Pl help
-Sagar