[ 
https://issues.apache.org/jira/browse/HDFS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7604:
--------------------------------
    Attachment: HDFS-7604.005.patch

Jitendra, thank you for reviewing.  Here is patch v005, containing both of the 
changes that you suggested.

bq. If volumeFailureSummary is not null, it might be more accurate to compare 
last failure timestamp?

Yes, that's particularly relevant when considering the new live DataNode 
reconfiguration feature.  If volumes are reconfigured, and there are the same 
number of volume failures, but the actual volumes are different, then the old 
logic wouldn't have caught it.  Comparing the last failure timestamps handles 
it well.

bq. In case of rolling upgrades, the older version of datanodes, will not send 
volumeFailureSummary, and the newer namenode might erroneously conclude 0 
volume failures.

That's a great catch.  I restored explicit tracking of the {{volumeFailures}} 
counter in {{DatanodeDescriptor}}.  The implementation of 
{{DatanodeDescriptor#getVolumeFailures}} is fine for both old and new DataNode 
heartbeats, because for the new case, we guarantee that this counter is 
consistent with the value returned from {{getVolumeFailureSummary}}.

The test failure in the last Jenkins run was unrelated.

> Track and display failed DataNode storage locations in NameNode.
> ----------------------------------------------------------------
>
>                 Key: HDFS-7604
>                 URL: https://issues.apache.org/jira/browse/HDFS-7604
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HDFS-7604-screenshot-1.png, HDFS-7604-screenshot-2.png, 
> HDFS-7604-screenshot-3.png, HDFS-7604-screenshot-4.png, 
> HDFS-7604-screenshot-5.png, HDFS-7604-screenshot-6.png, 
> HDFS-7604-screenshot-7.png, HDFS-7604.001.patch, HDFS-7604.002.patch, 
> HDFS-7604.004.patch, HDFS-7604.005.patch, HDFS-7604.prototype.patch
>
>
> During heartbeats, the DataNode can report a list of its storage locations 
> that have been taken out of service due to failure (such as due to a bad disk 
> or a permissions problem).  The NameNode can track these failed storage 
> locations and then report them in JMX and the NameNode web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to