[jira] [Updated] (HDFS-5693) Few NN metrics data points were collected via JMX when NN is under heavy load

Ming Ma (JIRA) Fri, 20 Dec 2013 15:44:44 -0800

     [ 
https://issues.apache.org/jira/browse/HDFS-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ming Ma updated HDFS-5693:
--------------------------

    Attachment: HADOOP-5693.patch

The fixes are:

1. Remove unnecessary reader lock for FSNamesystem method getCapacity.
2. Remove unnecessary reader lock for FSNamesystem method get getFilesTotal as 
FSDirectory has its own lock.
3. DataManager related methods to get datanodes with different states don't 
need to acquire reader lock as it is synchronized at datanodeMap object.
4. Change some member variables used in SafeModeInfo to be volatile so that we 
don't need reader lock to get the values.
5. For getCorruptFiles, we do a quick check first. If there is no corrupt 
block, we don't need to take reader lock. This will take care of the scenarios 
when the cluster doesn't have corrupt blocks. Still need to open a separate 
jira to take care of the scenario with corrupt blocks.
6. Add unit test.

> Few NN metrics data points were collected via JMX when NN is under heavy load
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-5693
>                 URL: https://issues.apache.org/jira/browse/HDFS-5693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Ming Ma
>         Attachments: HADOOP-5693.patch
>
>
> JMX sometimes doesn’t return any value when NN is under heavy load. However, 
> that is when we would like to get metrics to help to diagnosis the issue.
> When NN is under heavy load due to bad application or other reasons, it holds 
> FSNamesystem's writer lock for a long period of time. Many of the 
> FSNamesystem metrics require FSNamesystem's reader lock and thus can't be 
> processed.
> This is a special case to improve the overall NN concurrency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HDFS-5693) Few NN metrics data points were collected via JMX when NN is under heavy load

Reply via email to