Re: Under replicated block doesn't get fixed until DFS restart

Ted Dunning Mon, 07 Jan 2008 11:23:02 -0800

The fsck output shows at least one file that doesn't have a replica.

I have seen situations where a block would not replicate.  It turned out to
be due to a downed node that had not yet been marked as down.  Once the
system finally realized the node was down, the fsck changed from reporting
low replication to reporting missing blocks.  I believe that the cause of
this was my messing around with datanodes trying to add storage on the fly
and then trying to move blocks around (unsuccessfully).


My only repair was to delete the files in question.  By design, this isn't a
big problem for me to do since I don't quite trust hadoop with mission
critical storage yet.


On 1/7/08 10:59 AM, "Chris Kline" <[EMAIL PROTECTED]> wrote:

> I believe there was at least one good block (see fsck output).  All
> data nodes were up at the time according to the web page.  I grep'd
> the namenode log files for the under replicated blocks and only got
> an entry for when it was created and entries for when the replication
> was fixed after the HDFS restart.  Here is the result of fsck:
> 
> $HADOOP_HOME/bin/hadoop fsck /
> .......................................................
> /data/hbase1/hregion_70236052/compaction.dir/hregion_70236052/info/
> done:  Under replicated blk_1984980330938654629. Target Replicas is 3
> but found 1 replica(s).
> ..
> /data/hbase1/hregion_70236052/info/info/2807320534360768620:  Under
> replicated blk_1717622121416314549. Target Replicas is 3 but found 1
> replica(s).
> .
> /data/hbase1/hregion_70236052/info/mapfiles/2807320534360768620/
> data:  Under replicated blk_-5019714262388221150. Target Replicas is
> 3 but found 1 replica(s).
> .
> /data/hbase1/hregion_70236052/info/mapfiles/2807320534360768620/
> index:  Under replicated blk_4522585614366970680. Target Replicas is
> 3 but found 1 replica(s).
> .........................................
> /data/hbase1/log_10.100.11.63_1199307142676_60020/hlog.dat.000:
> Under replicated blk_-2871471426720379908. Target Replicas is 3 but
> found 1 replica(s).
> .......
> /data/hbase1/log_10.100.11.65_1199307142711_60020/hlog.dat.000:
> MISSING 1 blocks of total size 0 B.
> .Status: CORRUPT
>   Total size:    71009158262 B
>   Total blocks:  16318 (avg. block size 4351584 B)
>   Total dirs:    21416
>   Total files:   16253
>   Over-replicated blocks:        0 (0.0 %)
>   Under-replicated blocks:       5 (0.03064101 %)
>   Target replication factor:     3
>   Real replication factor:       2.9993873
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> -Chris
> 
> On Jan 4, 2008, at 1:02 PM, Raghu Angadi wrote:
> 
>> This is of course not expected. A more detailed info or log message
>> would help. Do you know if there is at least one good block?
>> Sometimes, the remaining "good" block might actually be corrupted
>> and thus can not replicate itself. Restarting might just have
>> brought up the datanodes that were down (for whatever reason)
>> before the restart.
>> 
>> Raghu.
>> 
>> Chris Kline wrote:
>>> fsck reports several under replicated blocks, but these do not get
>>> fixed until I restart DFS.  fsck also reports a missing block at
>>> the same time, but this should affect the function of fixing under
>>> replicated blocks.  Has anyone seen this before?
>>> I'm running 0.15.0.
>>> -Chris Kline
>> 
> 
> -Chris
> 
> We're hiring engineers.  $10,007 reward for referrals we hire.
> 
>

Re: Under replicated block doesn't get fixed until DFS restart

Reply via email to