[
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843829#comment-16843829
]
star commented on HDFS-12914:
-----------------------------
[~smarella] how many DNs do you have? According to the limited logs, I think
it is caused by following case. A high load delayed the process of full block
report.
||DN1...||DN2||
|register|register|
|request Lease| |
|process Request| |
|...|request Lease|
|process Request|{color:#707070}_more than 5 minutes_{color}|
|...|process Request|
There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs
unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]?
In that time, I think the SNN is processing blockreports from other DN. Untill
2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6
minutes after when full block lease id is requested, beyond default expire
value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT).
Don't known when a full block lease id is got from server, for there's no info
log about it. I guess it's about 5 minutes before the first failed report, say
15:26:29.
> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0
> Reporter: Daryn Sharp
> Priority: Critical
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for
> conditions such as "unknown datanode", "not in pending set", "lease has
> expired", wrong lease id, etc. Lease rejection does not throw an exception.
> It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes
> active with _no blocks_. A replication storm ensues possibly causing DNs to
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on
> re-registration. The cluster will have many "missing blocks" until the DNs
> next FBR is sent and/or forced.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]