[ 
https://issues.apache.org/jira/browse/HDFS-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616859#comment-14616859
 ] 

Daryn Sharp commented on HDFS-8652:
-----------------------------------

Big -1.  This must be reverted (but unfortunately I don't know git very well) 
immediately.  The premise is wrong.  This patch also contains many far reaching 
and unrelated changes than the description suggests.

Most importantly, the distinction between a Block and a BlockInfo _must_ not be 
blurred.
* A BlockInfo is a Block stored in the blocks map.
* A plain Block is just a key for looking up the current BlockInfo in the 
blocks map.
* A BlockInfo cannot be retained outside of the fsn lock or it may go stale, 
which is most illustrated by:
* Datastructures that queue blocks for later operation must be typed as Block.  
In fact, queuing datastructures _must never_ contain a BlockInfo to enforce the 
lookup of the current stored block.
* A BlockInfo's genstamp and/or size may mutate.  The block may mutate in the 
map via replacement as it transitions to/from UC.  Deferred processing of a 
BlockInfo that mutated causes insanely hard to debug race conditions.

So regarding what this patch specifically claims to do:  A corrupt block may be 
based on the former attributes, so future decisions on mutable fields are 
invalid.  The CorruptReplicasMap must be Block based.

I was alerted to this because of an internal merge conflict caused by this 
patch's unrelated change to postponedMisreplicatedBlocks to be typed as a 
BlockInfo instead of a Block.  Although getStoredBlock is later called on the 
queued BlockInfo, the set should not have been changed to be a BlockInfo for 
the aforementioned reasons.

Another unrelated change appears to have introduced a subtle bug to block 
invalidation by queuing the stored block instead of the corrupted block.  They 
may be equal "now" but not necessarily in the future.

> Track BlockInfo instead of Block in CorruptReplicasMap
> ------------------------------------------------------
>
>                 Key: HDFS-8652
>                 URL: https://issues.apache.org/jira/browse/HDFS-8652
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>             Fix For: 2.8.0
>
>         Attachments: HDFS-8652.000.patch, HDFS-8652.001.patch, 
> HDFS-8652.002.patch
>
>
> Currently {{CorruptReplicasMap}} uses {{Block}} as its key and records the 
> list of DataNodes with corrupted replicas. For Erasure Coding since a striped 
> block group contains multiple internal blocks with different block ID, we 
> should use {{BlockInfo}} as the key.
> HDFS-8619 is the jira to fix this for EC. To ease merging we will use jira to 
> first make changes in trunk/branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to