[jira] Issue Comment Edited: (HADOOP-4810) Data lost at cluster startup time

Sanjay Radia (JIRA) Fri, 12 Dec 2008 15:01:11 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656191#action_12656191
 ]


sanjay.radia edited comment on HADOOP-4810 at 12/12/08 2:59 PM:
----------------------------------------------------------------

Looks good. 4 things: +1 modulo the following

1) Change the comment
              // Delete new replica.
   to 
           // mark  new replica as corrupt

2) for each of the cases check to see if the lease is open. If lease is not 
open, log an error that we got a length mismatch even when the file was not 
open.
  Also file a jira for the case when the lease is not open to perhaps write to 
edits log to record the new length (I am not sure if writing the new length
is right or wrong but we can  debate this on that jira).

3) Your fix will not let us distinguish  between true corruption caused by some 
bug in HDFS,  and the normal mismatch that can occur during appends when 
a client dies (I am not sure of this but that is my recollection from the 
append discussions with Dhruba last year at yahoo).
This is okay for now. But let us file a jira to fix this so that we can 
distinguish.
The easy code fix for this is to add a field to internal data structure to 
record the original length in fsimage - but this will increase the usage of 
memory in the system
since the 4 bytes will be multiplied by the number of  logical blocks in the 
system.

4) In my opinion the correct behavior for shorter blocks (but longer than the 
fsimage recorded length) is to invalidate as in the original code - however our 
invalidation code does not handle the case because if the "corrupt" block is 
the last one it keeps it as valid.  Thus your patch is a good emergency fix to 
this very critical problem.
 I suggest that we  file a jira to handle invaliding such invalid blocks 
correctly.
Note here I am distinguishing between *corrupt* blocks (caused by hardware 
errors or by bugs in our software) and *invalid* blocks (those lenght 
mismatches that
can occur due to client or other failures). Others may not share the 
distinction I make - lets debate that in the jira; we need to get this patch 
out ASAP.




      was (Author: sanjay.radia):
    Looks good. 4 things:

1) Change the comment
              // Delete new replica.
   to 
           // mark  new replica as corrupt

2) for each of the cases check to see if the lease is open. If lease is not 
open, log an error that we got a length mismatch even when the file was not 
open.
  Also file a jira for the case when the lease is not open to perhaps write to 
edits log to record the new length (I am not sure if writing the new length
is right or wrong but we can  debate this on that jira).

3) Your fix will not let us distinguish  between true corruption caused by some 
bug in HDFS,  and the normal mismatch that can occur during appends when 
a client dies (I am not sure of this but that is my recollection from the 
append discussions with Dhruba last year at yahoo).
This is okay for now. But let us file a jira to fix this so that we can 
distinguish.
The easy code fix for this is to add a field to internal data structure to 
record the original length in fsimage - but this will increase the usage of 
memory in the system
since the 4 bytes will be multiplied by the number of  logical blocks in the 
system.

4) In my opinion the correct behavior for shorter blocks (but longer than the 
fsimage recorded length) is to invalidate as in the original code - however our 
invalidation code does not handle the case because if the "corrupt" block is 
the last one it keeps it as valid.  Thus your patch is a good emergency fix to 
this very critical problem.
 I suggest that we  file a jira to handle invaliding such invalid blocks 
correctly.
Note here I am distinguishing between *corrupt* blocks (caused by hardware 
errors or by bugs in our software) and *invalid* blocks (those lenght 
mismatches that
can occur due to client or other failures). Others may not share the 
distinction I make - lets debate that in the jira; we need to get this patch 
out ASAP.



  
> Data lost at cluster startup time
> ---------------------------------
>
>                 Key: HADOOP-4810
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4810
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: corruptBlocksStartup.patch
>
>
> hadoop dfs -cat file1 returns
> dfs.DFSClient: Could not obtain block blk_XX_0 from any node: 
> java.io.IOException: No live nodes contain current block
> Tracing the history of the block from NN log, we found
>  WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block 
> blk_-6160940519231606858_0 reported from A1.A2.A3.A4:50010 current size is 
> 9303872 reported size is 262144
>  WARN org.apache.hadoop.fs.FSNamesystem: Deleting block 
> blk_-6160940519231606858_0 from A1.A2.A3.A4:50010
> INFO org.apache.hadoop.dfs.StateChange: DIR* NameSystem.invalidateBlock: 
> blk_-6160940519231606858_0 on A1.A2.A3.A4:50010 
> WARN org.apache.hadoop.fs.FSNamesystem: Error in deleting bad block 
> blk_-6160940519231606858_0 org.apache.hadoop.dfs.SafeModeException: Cannot 
> invalidate block blk_-6160940519231606858_0. Name node is in safe mode. 
> WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block 
> blk_-6160940519231606858_0 reported from B1.B2.B3.B4:50010 current size is 
> 9303872 reported size is 306688 
> WARN org.apache.hadoop.fs.FSNamesystem: Deleting block 
> blk_-6160940519231606858_0 from B1.B2.B3.B4:50010 
> INFO org.apache.hadoop.dfs.StateChange: DIR* NameSystem.invalidateBlock: 
> blk_-6160940519231606858_0 on B1.B2.B3.B4:50010 
> WARN org.apache.hadoop.fs.FSNamesystem: Error in deleting bad block 
> blk_-6160940519231606858_0 org.apache.hadoop.dfs.SafeModeException: Cannot 
> invalidate block blk_-6160940519231606858_0. Name node is in safe mode. 
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.chooseExcessReplicates: (C1.C2.C3.C4:50010, 
> blk_-6160940519231606858_0) is added to recentInvalidateSets 
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.chooseExcessReplicates: (D1.D2.D3.D4:50010, 
> blk_-6160940519231606858_0) is added to recentInvalidateSets
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask C1.C2.C3.C4:50010 to 
> delete blk_-6160940519231606858_0
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask D1.D2.D3.D4:50010 to 
> delete blk_-6160940519231606858_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4810) Data lost at cluster startup time

Reply via email to