Hi,

I'd love to see the DRBD+Hadoop write up!  Not only would this be useful for 
Hadoop, I can see this being useful for Solr (master replication).


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: C G <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Monday, May 12, 2008 2:40:57 PM
> Subject: Re: HDFS corrupt...how to proceed?
> 
> Thanks to everyone who responded.   Things are back on the air now - all the 
> replication issues seem to have gone away.  I am wading through a detailed 
> fsck 
> output now looking for specific problems on a file-by-file basis.
>   
>   Just in case anybody is interested, we mirror our master nodes using DRBD.  
> It 
> performed very well in this first "real world" test.  If there is interest I 
> can 
> write up how we protect our master nodes in more detail and share w/the 
> community.
>   
>   Thanks,
>   C G
> 
> Ted Dunning wrote:
>   
> 
> You don't need to correct over-replicated files.
> 
> The under-replicated files should cure themselves, but there is a problem on
> old versions where that doesn't happen quite right.
> 
> You can use hadoop fsck / to get a list of the files that are broken and
> there are options to copy what remains of them to lost+found or to delete
> them.
> 
> Other than that, things should correct themselves fairly quickly.
> 
> 
> On 5/11/08 8:23 PM, "C G" 
> wrote:
> 
> > Hi All:
> > 
> > We had a primary node failure over the weekend. When we brought the node
> > back up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure
> > how best to proceed. Any advice is greatly appreciated. If I've missed a
> > Wiki page or documentation somewhere please feel free to tell me to RTFM and
> > let me know where to look.
> > 
> > Specific question: how to clear under and over replicated files? Is the
> > correct procedure to copy the file locally, delete from HDFS, and then copy
> > back to HDFS?
> > 
> > The fsck output is long, but the final summary is:
> > 
> > Total size: 4899680097382 B
> > Total blocks: 994252 (avg. block size 4928006 B)
> > Total dirs: 47404
> > Total files: 952070
> > ********************************
> > CORRUPT FILES: 2
> > MISSING BLOCKS: 24
> > MISSING SIZE: 1501009630 B
> > ********************************
> > Over-replicated blocks: 1 (1.0057812E-4 %)
> > Under-replicated blocks: 14958 (1.5044476 %)
> > Target replication factor: 3
> > Real replication factor: 2.9849212
> > 
> > The filesystem under path '/' is CORRUPT
> > 
> > 
> > ---------------------------------
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> > now.
> 
> 
> 
>       
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it 
> now.

Reply via email to