[
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee updated HDFS-11609:
------------------------------
Attachment: HDFS-11609.trunk.patch
> Some blocks can be permanently lost if nodes are decommissioned while dead
> --------------------------------------------------------------------------
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-11609.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while
> they are dead, they get decommissioned right away even if there are missing
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online.
> The namenode no longer shows missing blocks, which creates a false sense of
> cluster health. When the decommissioned nodes are removed and reformatted,
> the block data is permanently lost. The namenode will report missing blocks
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in
> testing vs. production, it took a while to reproduce it in a unit test. I
> will present analysis and proposal soon.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]