[
https://issues.apache.org/jira/browse/HDFS-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504768#comment-15504768
]
Kihwal Lee commented on HDFS-10857:
-----------------------------------
The patch partially brings in changes available in branch-2.7 to removing
volumes easier.
> Rolling upgrade can make data unavailable when the cluster has many failed
> volumes
> ----------------------------------------------------------------------------------
>
> Key: HDFS-10857
> URL: https://issues.apache.org/jira/browse/HDFS-10857
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.4
> Reporter: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-10857.branch-2.6.patch
>
>
> When the marker file or trash dir is created or removed during the heartbeat
> response processing, an {{IOException}} is thrown if tried on a failed
> volume. This stops processing of the rest of storage directories and any
> DNA commands that were part of the heartbeat response.
> While this is happening, the block token key update does not happen and all
> read and write requests start to fail, until the upgrade is finalized and the
> DN receives a new key. All it takes is one failed volume. If there are three
> such nodes in the cluster, it is very likely that some blocks cannot be read.
> The NN has no idea unlike the common missing blocks scenarios, although the
> effect is the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]