[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

tomscut (Jira) Thu, 18 Feb 2021 17:55:06 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286811#comment-17286811
 ]


tomscut edited comment on HDFS-15808 at 2/19/21, 1:54 AM:
----------------------------------------------------------

Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

[^lockLongHoldCount]


was (Author: tomscut):
Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

 

[^lockLongHoldCount]

> Add metrics for FSNamesystem read/write lock hold long time
> -----------------------------------------------------------
>
>                 Key: HDFS-15808
>                 URL: https://issues.apache.org/jira/browse/HDFS-15808
>             Project: Hadoop HDFS
>          Issue Type: Wish
>          Components: hdfs
>            Reporter: tomscut
>            Assignee: tomscut
>            Priority: Major
>              Labels: hdfs, lock, metrics, pull-request-available
>         Attachments: lockLongHoldCount
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

Reply via email to