[ 
https://issues.apache.org/jira/browse/HDFS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7604:
--------------------------------
    Attachment: HDFS-7604.001.patch

The attached patch implements the feature.  Summary:
* The protocol definition for heartbeat requests has been changed to add 
{{failedStorageLocations}}, which contains multiple strings used to report the 
local file system path of each failed storage.
* The DN calculates its failed storage locations as the set difference between 
everything configured in {{dfs.datanode.data.dir}} and the current live volumes 
in use by the {{FsDatasetImpl}}.  Doing it this way works well with the live DN 
reconfiguration feature (HDFS-6808), because it will use the current active 
configuration rather than what was loaded at process start time.
* The failed storage locations are exposed through {{FSDatasetMBean}}, so the 
metrics on an individual DN will publish that information.  I also updated 
{{FsDatasetImpl#getNumFailedVolumes}} to keep its implementation in sync with 
the new method.
* {{FsVolumeList}} no longer needs to track a separate counter of failed 
volumes.  As a side effect, I believe this is fixing a potential bug with live 
DN reconfiguration.  (If a previously failed volume was brought back online 
through live reconfiguration, then I don't believe this counter would have been 
decremented or reset to reflect the new state.)
* On the NN side, the heartbeat handling now updates its data structures to 
keep track of the failed storage locations per DN.
* The failed storage locations for all DNs are exposed through 
{{FSNamesystemMBean}}.  There is also a new counter for the total volume 
failures across all DNs.
* The web UI templates have been updated to display the new data.
* {{TestDataNodeVolumeFailureReporting}} contains the testing related to this 
feature.  I took the opportunity to do a few other minor cleanups in this file.
* Numerous other test files contain minor changes to deal with method signature 
changes related to passing the new field in the heartbeat.

> Track and display failed DataNode storage locations in NameNode.
> ----------------------------------------------------------------
>
>                 Key: HDFS-7604
>                 URL: https://issues.apache.org/jira/browse/HDFS-7604
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HDFS-7604-screenshot-1.png, HDFS-7604-screenshot-2.png, 
> HDFS-7604-screenshot-3.png, HDFS-7604.001.patch, HDFS-7604.prototype.patch
>
>
> During heartbeats, the DataNode can report a list of its storage locations 
> that have been taken out of service due to failure (such as due to a bad disk 
> or a permissions problem).  The NameNode can track these failed storage 
> locations and then report them in JMX and the NameNode web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to