I had a secondary namenode running on the namenode machine.
I deleted the dfs.name.dir
then bin/hadoop namenode -importCheckpoint.

and restarted the dfs.

I guess the deletion of name.dir will delete the edit logs.
Can u pl tell me that this will not lead to replaying the delete transactions ?

Thanks for help/advice


-Sagar

lohit wrote:
NameNode would not come out of safe mode as it is still waiting for datanodes to report those blocks which it expects. I should have added, try to get a full output of fsck
fsck <path> -openforwrite -files -blocks -location.
-openforwrite files should tell you what files where open during the 
checkpoint, you might want to double check that is the case, the files were 
being writting during that moment. May be by looking at the filename you could 
tell if that was part of a job which was running.

For any missing block, you might also want to cross verify on the datanode to 
see if is really missing.

Once you are convinced that those are the only corrupt files which you can live with, start datanodes. Namenode woudl still not come out of safemode as you have missing blocks, leave it for a while, run fsck look around, if everything ok, bring namenode out of safemode.
I hope you had started this namenode with old image and empty edits. You do not 
want your latest edits to be replayed, which has your delete transactions.

Thanks,
Lohit



----- Original Message ----
From: Sagar Naik <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, November 14, 2008 12:11:46 PM
Subject: Re: Recovery of files in hadoop 18

Hey Lohit,

Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.

But for some reason, the namenode is still in safe_mode. It has been an hour or 
so.
The fsck report is :

Total size:    6954466496842 B (Total open files size: 543469222 B)
Total dirs:    1159
Total files:   1354155 (Files currently being written: 7673)
Total blocks (validated):      1375725 (avg. block size 5055128 B) (Total open 
file blocks (not validated): 50)
********************************
CORRUPT FILES:        1574
MISSING BLOCKS:       1574
MISSING SIZE:         1165735334 B
CORRUPT BLOCKS:       1574
********************************
Minimally replicated blocks:   1374151 (99.88559 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       26619 (1.9349071 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.977127
Corrupt blocks:                1574
Missing replicas:              26752 (0.65317154 %)


Do you think, I should manually override the safemode and delete all the 
corrupted files and restart

-Sagar


lohit wrote:
If you have enabled thrash. They should be moved to trash folder before 
permanently deleting them, restore them back. (hope you have that set 
fs.trash.interval)

If not Shut down the cluster.
Take backup of you dfs.data.dir (both on namenode and secondary namenode).

Secondary namenode should have last updated image, try to start namenode from that image, dont use the edits from namenode yet. Try do importCheckpoint explained in here https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173. Start only namenode and run fsck -files. it will throw lot of messages saying you are missing blocks but thats fine since you havent started datanodes yet. But if it shows your files, that means they havent been deleted yet. This will give you a view of system of last backup. Start datanode If its up, try running fsck and check consistency of the sytem. you would lose all changes that has happened since the last checkpoint.
Hope that helps,
Lohit



----- Original Message ----
From: Sagar Naik <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, November 14, 2008 10:38:45 AM
Subject: Recovery of files in hadoop 18

Hi,
I accidentally deleted the root folder in our hdfs.
I have stopped the hdfs

Is there any way to recover the files from secondary namenode

Pl help


-Sagar

Reply via email to