Hi, I'd love to see the DRBD+Hadoop write up! Not only would this be useful for Hadoop, I can see this being useful for Solr (master replication).
Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: C G <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, May 12, 2008 2:40:57 PM > Subject: Re: HDFS corrupt...how to proceed? > > Thanks to everyone who responded. Things are back on the air now - all the > replication issues seem to have gone away. I am wading through a detailed > fsck > output now looking for specific problems on a file-by-file basis. > > Just in case anybody is interested, we mirror our master nodes using DRBD. > It > performed very well in this first "real world" test. If there is interest I > can > write up how we protect our master nodes in more detail and share w/the > community. > > Thanks, > C G > > Ted Dunning wrote: > > > You don't need to correct over-replicated files. > > The under-replicated files should cure themselves, but there is a problem on > old versions where that doesn't happen quite right. > > You can use hadoop fsck / to get a list of the files that are broken and > there are options to copy what remains of them to lost+found or to delete > them. > > Other than that, things should correct themselves fairly quickly. > > > On 5/11/08 8:23 PM, "C G" > wrote: > > > Hi All: > > > > We had a primary node failure over the weekend. When we brought the node > > back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure > > how best to proceed. Any advice is greatly appreciated. If I've missed a > > Wiki page or documentation somewhere please feel free to tell me to RTFM and > > let me know where to look. > > > > Specific question: how to clear under and over replicated files? Is the > > correct procedure to copy the file locally, delete from HDFS, and then copy > > back to HDFS? > > > > The fsck output is long, but the final summary is: > > > > Total size: 4899680097382 B > > Total blocks: 994252 (avg. block size 4928006 B) > > Total dirs: 47404 > > Total files: 952070 > > ******************************** > > CORRUPT FILES: 2 > > MISSING BLOCKS: 24 > > MISSING SIZE: 1501009630 B > > ******************************** > > Over-replicated blocks: 1 (1.0057812E-4 %) > > Under-replicated blocks: 14958 (1.5044476 %) > > Target replication factor: 3 > > Real replication factor: 2.9849212 > > > > The filesystem under path '/' is CORRUPT > > > > > > --------------------------------- > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it > > now. > > > > > --------------------------------- > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it > now.
