[
https://issues.apache.org/jira/browse/HDFS-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14616859#comment-14616859
]
Daryn Sharp commented on HDFS-8652:
-----------------------------------
Big -1. This must be reverted (but unfortunately I don't know git very well)
immediately. The premise is wrong. This patch also contains many far reaching
and unrelated changes than the description suggests.
Most importantly, the distinction between a Block and a BlockInfo _must_ not be
blurred.
* A BlockInfo is a Block stored in the blocks map.
* A plain Block is just a key for looking up the current BlockInfo in the
blocks map.
* A BlockInfo cannot be retained outside of the fsn lock or it may go stale,
which is most illustrated by:
* Datastructures that queue blocks for later operation must be typed as Block.
In fact, queuing datastructures _must never_ contain a BlockInfo to enforce the
lookup of the current stored block.
* A BlockInfo's genstamp and/or size may mutate. The block may mutate in the
map via replacement as it transitions to/from UC. Deferred processing of a
BlockInfo that mutated causes insanely hard to debug race conditions.
So regarding what this patch specifically claims to do: A corrupt block may be
based on the former attributes, so future decisions on mutable fields are
invalid. The CorruptReplicasMap must be Block based.
I was alerted to this because of an internal merge conflict caused by this
patch's unrelated change to postponedMisreplicatedBlocks to be typed as a
BlockInfo instead of a Block. Although getStoredBlock is later called on the
queued BlockInfo, the set should not have been changed to be a BlockInfo for
the aforementioned reasons.
Another unrelated change appears to have introduced a subtle bug to block
invalidation by queuing the stored block instead of the corrupted block. They
may be equal "now" but not necessarily in the future.
> Track BlockInfo instead of Block in CorruptReplicasMap
> ------------------------------------------------------
>
> Key: HDFS-8652
> URL: https://issues.apache.org/jira/browse/HDFS-8652
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Fix For: 2.8.0
>
> Attachments: HDFS-8652.000.patch, HDFS-8652.001.patch,
> HDFS-8652.002.patch
>
>
> Currently {{CorruptReplicasMap}} uses {{Block}} as its key and records the
> list of DataNodes with corrupted replicas. For Erasure Coding since a striped
> block group contains multiple internal blocks with different block ID, we
> should use {{BlockInfo}} as the key.
> HDFS-8619 is the jira to fix this for EC. To ease merging we will use jira to
> first make changes in trunk/branch-2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)