[
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905338#comment-16905338
]
Wei-Chiu Chuang commented on HDFS-12914:
----------------------------------------
This is becoming a mess, my bad. [~Jim_Brennan] thanks a lot for letting me
know.
HDFS-13898 added a helper method to use a helper method
(BlockManager#setBlockManagerForTesting()) added in the branch-2 backport.
Here's what I propose:
(1) File a new Jira to add the missing helper method. I don't want to revert
HDFS-13898 because ultimately we want to cherry pick HDFS-12914 into branch-2,
and we still need that missing helper method.
(2) resolve this Jira since this is already a mess here.
(3) I'll file a new Jira to backport HDFS-12914 to branch-2, later.
[~csun] FYI.
> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0, 2.9.2
> Reporter: Daryn Sharp
> Assignee: Santosh Marella
> Priority: Critical
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-12914-branch-2.001.patch,
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch,
> HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch,
> HDFS-12914.009.patch, HDFS-12914.branch-2.000.patch,
> HDFS-12914.branch-2.001.patch, HDFS-12914.branch-2.002.patch,
> HDFS-12914.branch-2.8.001.patch, HDFS-12914.branch-2.8.002.patch,
> HDFS-12914.branch-2.patch, HDFS-12914.branch-3.0.patch,
> HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch,
> HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for
> conditions such as "unknown datanode", "not in pending set", "lease has
> expired", wrong lease id, etc. Lease rejection does not throw an exception.
> It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes
> active with _no blocks_. A replication storm ensues possibly causing DNs to
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on
> re-registration. The cluster will have many "missing blocks" until the DNs
> next FBR is sent and/or forced.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]