[
https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370321#comment-14370321
]
Chris Nauroth commented on HDFS-7833:
-------------------------------------
Hi [~eddyxu]. Thank you for the patch. Overall, this looks correct to me.
When I I filed this issue, it was before some of the discussion we had in
HDFS-7722, and I had 2 cases in mind that could trigger this bug:
* Admin reconfigures DataNode to remove a path that has a failed volume. As
per discussion in HDFS-7722, we've made the decision that this case should not
clear volume failure information. Since this is logically still considered a
volume failure, there is no harm done to the check for sufficient resources.
IOW, after the discussion in HDFS-7722, we don't have to worry about this case
anymore.
* Admin reconfigures DataNode and adds a few new paths that weren't there
before. This case is still a problem.
To properly cover the second case, let's add a test that does something like
this:
# Start a DataNode with 2 volumes and {{dfs.datanode.failed.volumes.tolerated}}
set to 1.
# Run DataNode reconfiguration to add a new volume. Now we're up to 3 volumes
total.
# Fail a volume. Assert that the DataNode continues running.
# Fail another volume. Assert that the DataNode stops running.
Without your patch, I expect this test would fail on the last step, because
{{validVolsRequired}} would have been calculated as 1, and we still have 1
volume remaining. After applying your patch, I expect the test would then pass.
> DataNode reconfiguration does not recalculate valid volumes required, based
> on configured failed volumes tolerated.
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-7833
> URL: https://issues.apache.org/jira/browse/HDFS-7833
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: Chris Nauroth
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7833.000.patch
>
>
> DataNode reconfiguration never recalculates
> {{FsDatasetImpl#validVolsRequired}}. This may cause incorrect behavior of
> the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration
> causes the DataNode to run with a different total number of volumes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)