[
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haohui Mai updated HDFS-11609:
------------------------------
Target Version/s: 2.7.4, 2.8.1 (was: 2.8.1)
> Some blocks can be permanently lost if nodes are decommissioned while dead
> --------------------------------------------------------------------------
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.0
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Blocker
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch,
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while
> they are dead, they get decommissioned right away even if there are missing
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online.
> The namenode no longer shows missing blocks, which creates a false sense of
> cluster health. When the decommissioned nodes are removed and reformatted,
> the block data is permanently lost. The namenode will report missing blocks
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in
> testing vs. production, it took a while to reproduce it in a unit test. I
> will present analysis and proposal soon.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]