[
https://issues.apache.org/jira/browse/HDFS-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HDFS-8869:
--------------------------------------
Target Version/s: (was: 2.7.6)
> Don't mark storages as failed before first block report
> -------------------------------------------------------
>
> Key: HDFS-8869
> URL: https://issues.apache.org/jira/browse/HDFS-8869
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Rushabh S Shah
> Assignee: Daryn Sharp
> Priority: Major
>
> Creating this ticket on behalf of [~daryn].
> Heartbeat processing performs the failed storage check. The DN reports its
> storages and any prior missing storages, ex. unique storage id upgrade, are
> marked failed. The heartbeat monitor removes all blocks associated to the
> failed storage. A replication storm ensues for all blocks on the node.
> Eventually the DN block reports for the new storages - up to 15m later for
> large clusters. Now the NN has many excess blocks to invalidate. If the
> cluster has failed over in the past 24h, ex. rolling upgrade, the standby
> gone active will queue the block invalidations which triggers the severe
> performance degradation of HDFS-8674 which has been greatly lessened but is
> still an issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]