[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

Chris Nauroth (JIRA) Tue, 24 Feb 2015 10:23:48 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335185#comment-14335185
 ]


Chris Nauroth commented on HDFS-7830:
-------------------------------------

Hi [~eddyxu].  Another potential problem that I've noticed in the DataNode 
reconfiguration code is that it never recalculates 
{{FsDatasetImpl#validVolsRequired}}.  This is a {{final}} variable calculated 
as (# volumes configured) - (# volume failures tolerated):
{code}
    this.validVolsRequired = volsConfigured - volFailuresTolerated;
{code}
If this variable is not updated for DataNode reconfigurations, then it could 
lead to some unexpected situations.  For example:
# DataNode starts running with 6 volumes (all healthy) and 
{{dfs.datanode.failed.volumes.tolerated}} set to 2.
# {{FsDatasetImpl#validVolsRequired}} is set to 6 - 2 = 4.
# DataNode is reconfigured to run with 8 volumes (all still healthy).
# Now 3 volumes fail.  The admin would expect the DataNode to abort, but there 
are 8 - 3 = 5 good volumes left, and {{FsDatasetImpl#validVolsRequired}} is 
still 4, so {{FsDatasetImpl#hasEnoughResource}} returns {{true}}.

Is this something that makes sense for you to address as part of the patch 
you're working on now, or would you prefer I file a separate jira to track 
this?  Thanks!

> DataNode does not release the volume lock when adding a volume fails.
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7830
>                 URL: https://issues.apache.org/jira/browse/HDFS-7830
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>
> When there is a failure in adding volume process, the {{in_use.lock}} is not 
> released. Also, doing another {{-reconfig}} to remove the new dir in order to 
> cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
> lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

Reply via email to