[ 
https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370321#comment-14370321
 ] 

Chris Nauroth commented on HDFS-7833:
-------------------------------------

Hi [~eddyxu].  Thank you for the patch.  Overall, this looks correct to me.  
When I I filed this issue, it was before some of the discussion we had in 
HDFS-7722, and I had 2 cases in mind that could trigger this bug:

* Admin reconfigures DataNode to remove a path that has a failed volume.  As 
per discussion in HDFS-7722, we've made the decision that this case should not 
clear volume failure information.  Since this is logically still considered a 
volume failure, there is no harm done to the check for sufficient resources.  
IOW, after the discussion in HDFS-7722, we don't have to worry about this case 
anymore.
* Admin reconfigures DataNode and adds a few new paths that weren't there 
before.  This case is still a problem.

To properly cover the second case, let's add a test that does something like 
this:
# Start a DataNode with 2 volumes and {{dfs.datanode.failed.volumes.tolerated}} 
set to 1.
# Run DataNode reconfiguration to add a new volume.  Now we're up to 3 volumes 
total.
# Fail a volume.  Assert that the DataNode continues running.
# Fail another volume.  Assert that the DataNode stops running.

Without your patch, I expect this test would fail on the last step, because 
{{validVolsRequired}} would have been calculated as 1, and we still have 1 
volume remaining.  After applying your patch, I expect the test would then pass.

> DataNode reconfiguration does not recalculate valid volumes required, based 
> on configured failed volumes tolerated.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7833
>                 URL: https://issues.apache.org/jira/browse/HDFS-7833
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Chris Nauroth
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-7833.000.patch
>
>
> DataNode reconfiguration never recalculates 
> {{FsDatasetImpl#validVolsRequired}}.  This may cause incorrect behavior of 
> the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration 
> causes the DataNode to run with a different total number of volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to