[jira] Commented: (HADOOP-5012) addStoredBlock should take into account corrupted blocks when determining excesses

Brian Bockelman (JIRA) Tue, 13 Jan 2009 20:31:28 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663621#action_12663621
 ]


Brian Bockelman commented on HADOOP-5012:
-----------------------------------------

Hey Hairong,

This is 0.19 + a few unrelated patches.

I have attached the relevant logs below.  I guess this patch is another way of 
solving HADOOP-4742, as you point out.  It effectively does the same thing, but 
4742 correctly identifies the source of the problem better.

I'll close as a duplicate.

Brian

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:38:58,399 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Inconsistent size for block blk_-1361380929156165877_44096 reported
from 172.16.1.164:50010 current size is 67108864 reported size is 0

18:38:58,399 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
Mark new replica blk_-1361380929156165877_44096 from
172.16.1.164:50010as corrupt because its length is shorter than
existing ones

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:38:58,399 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
NameSystem.addToCorruptReplicasMap: blk_-1361380929156165877 added as
corrupt on 172.16.1.164:50010 by /172.16.1.164

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:39:28,820 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
172.16.1.184:50010 to replicate blk_-1361380929156165877_44096 to
datanode(s) 172.16.1.90:50010

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:39:40,465 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 172.16.1.90:50010 is
added to blk_-1361380929156165877_44096 size 67108864

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:47,088 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
172.16.1.90:50010 to replicate blk_-1361380929156165877_44096 to
datanode(s) 172.16.1.158:50010

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 172.16.1.158:50010 is
added to blk_-1361380929156165877_44096 size 67108864

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.chooseExcessReplicates: (172.16.1.90:50010,
blk_-1361380929156165877_44096) is added to recentInvalidateSets

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.chooseExcessReplicates: (172.16.1.184:50010,
blk_-1361380929156165877_44096) is added to recentInvalidateSets

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.invalidateBlock: blk_-1361380929156165877_44096 on
172.16.1.164:50010

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.invalidateBlocks: blk_-1361380929156165877_44096 on
172.16.1.164:50010 is the only copy and was not deleted.

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.invalidateBlock: blk_-1361380929156165877_44096 on
172.16.1.165:50010

hadoop-root-namenode-hadoop-name.log.2009-01-05:2009-01-05
18:40:53,107 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.invalidateBlocks: blk_-1361380929156165877_44096 on
172.16.1.165:50010 is the only copy and was not deleted.


> addStoredBlock should take into account corrupted blocks when determining 
> excesses
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5012
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Brian Bockelman
>         Attachments: hadoop-5012.patch
>
>
> I found another source of corruption on our cluster.
> 0) Three replicas of a block exist
> 1) One is recognized as corrupt (3 reps total)
> 2) Namenode decides to create a new replica.  Replication done and 
> addStoredBlock is called (4 reps total)
> 3) There are too many replicas, so processOverReplicatedBlock is called by 
> addStoredBlock.
> 4) processOverReplicatedBlock is called, and it decides do invalidate the 
> newly created replica.  [Oddly enough, it decides to invalidate the newly 
> created one instead of the one in the corrupted replicas map!]
> 5) We are in the same state as (1) -- 3 replicas total, 1 of which is still 
> bad.
> I believe we can fix this easily -- change numCurrentReplica variable to take 
> into account the number of corrupt replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5012) addStoredBlock should take into account corrupted blocks when determining excesses

Reply via email to