I would suggest you run fsck with all options hadoop fsck / -files -blocks -locations This will give you details of blocks which are missing and which files they belong to. fsck output depends on the current state of the namenode and its knowledge about the blocks. The two output differ suggests that namenode state has been updated, meaning blocks which were missing earlier might be reported now. Check with full options and see which blocks from which files are missing. Thanks, Lohit
----- Original Message ---- From: C G <[EMAIL PROTECTED]> To: [email protected] Sent: Sunday, May 11, 2008 9:55:40 PM Subject: Re: HDFS corrupt...how to proceed? The system hosting the namenode experienced an OS panic and shut down, we subsequently rebooted it. Currently we don't believe there is/was a bad disk or other hardware problem. Something interesting: I've ran fsck twice, the first time it gave the result I posted. The second time I still declared the FS to be corrupt, but said: [many rows of periods deleted] ..........Status: CORRUPT Total size: 4900076384766 B Total blocks: 994492 (avg. block size 4927215 B) Total dirs: 47404 Total files: 952310 Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Target replication factor: 3 Real replication factor: 3.0 The filesystem under path '/' is CORRUPT So it seems like it's fixing some problems on its own? Thanks, C G Dhruba Borthakur <[EMAIL PROTECTED]> wrote: Did one datanode fail or did the namenode fail? By "fail" do you mean that the system was rebooted or was there a bad disk that caused the problem? thanks, dhruba On Sun, May 11, 2008 at 7:23 PM, C G wrote: > Hi All: > > We had a primary node failure over the weekend. When we brought the node back > up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure how > best to proceed. Any advice is greatly appreciated. If I've missed a Wiki > page or documentation somewhere please feel free to tell me to RTFM and let > me know where to look. > > Specific question: how to clear under and over replicated files? Is the > correct procedure to copy the file locally, delete from HDFS, and then copy > back to HDFS? > > The fsck output is long, but the final summary is: > > Total size: 4899680097382 B > Total blocks: 994252 (avg. block size 4928006 B) > Total dirs: 47404 > Total files: 952070 > ******************************** > CORRUPT FILES: 2 > MISSING BLOCKS: 24 > MISSING SIZE: 1501009630 B > ******************************** > Over-replicated blocks: 1 (1.0057812E-4 %) > Under-replicated blocks: 14958 (1.5044476 %) > Target replication factor: 3 > Real replication factor: 2.9849212 > > The filesystem under path '/' is CORRUPT > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. --------------------------------- Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.
