I second that request... I use DRDB for another project where I work and definitely see it's benefits, but I haven't tried it with hadoop yet.
Thanks On Tue, May 13, 2008 at 11:17 AM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi, > > I'd love to see the DRBD+Hadoop write up! Not only would this be useful > for Hadoop, I can see this being useful for Solr (master replication). > > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message ---- > > From: C G <[EMAIL PROTECTED]> > > To: [email protected] > > Sent: Monday, May 12, 2008 2:40:57 PM > > Subject: Re: HDFS corrupt...how to proceed? > > > > Thanks to everyone who responded. Things are back on the air now - all > the > > replication issues seem to have gone away. I am wading through a > detailed fsck > > output now looking for specific problems on a file-by-file basis. > > > > Just in case anybody is interested, we mirror our master nodes using > DRBD. It > > performed very well in this first "real world" test. If there is > interest I can > > write up how we protect our master nodes in more detail and share w/the > > community. > > > > Thanks, > > C G > > > > Ted Dunning wrote: > > > > > > You don't need to correct over-replicated files. > > > > The under-replicated files should cure themselves, but there is a problem > on > > old versions where that doesn't happen quite right. > > > > You can use hadoop fsck / to get a list of the files that are broken and > > there are options to copy what remains of them to lost+found or to delete > > them. > > > > Other than that, things should correct themselves fairly quickly. > > > > > > On 5/11/08 8:23 PM, "C G" > > wrote: > > > > > Hi All: > > > > > > We had a primary node failure over the weekend. When we brought the > node > > > back up and I ran Hadoop fsck, I see the file system is corrupt. I'm > unsure > > > how best to proceed. Any advice is greatly appreciated. If I've missed > a > > > Wiki page or documentation somewhere please feel free to tell me to > RTFM and > > > let me know where to look. > > > > > > Specific question: how to clear under and over replicated files? Is the > > > correct procedure to copy the file locally, delete from HDFS, and then > copy > > > back to HDFS? > > > > > > The fsck output is long, but the final summary is: > > > > > > Total size: 4899680097382 B > > > Total blocks: 994252 (avg. block size 4928006 B) > > > Total dirs: 47404 > > > Total files: 952070 > > > ******************************** > > > CORRUPT FILES: 2 > > > MISSING BLOCKS: 24 > > > MISSING SIZE: 1501009630 B > > > ******************************** > > > Over-replicated blocks: 1 (1.0057812E-4 %) > > > Under-replicated blocks: 14958 (1.5044476 %) > > > Target replication factor: 3 > > > Real replication factor: 2.9849212 > > > > > > The filesystem under path '/' is CORRUPT > > > > > > > > > --------------------------------- > > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try > it > > > now. > > > > > > > > > > --------------------------------- > > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try > it now. > >
