[
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286811#comment-17286811
]
tomscut edited comment on HDFS-15808 at 2/19/21, 2:29 AM:
----------------------------------------------------------
Hi [~shv] . Thank you for your reply and suggestions.
These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount
of change, and then set the alarm based on that. We can combine those metrics
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to
help further optimize performance.
For example, we use Prometheus to store metrics and use the expression
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]"
for monitoring.
The following is a graph of monitoring data.
[^lockLongHoldCount]
was (Author: tomscut):
Hi [~shv] . Thank you for your reply and suggestions.
These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount
of change, and then set the alarm based on that. We can combine those metrics
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to
help further optimize performance.
For example, we use Prometheus to store metrics and use the expression
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]"
for monitoring.
[^lockLongHoldCount]
> Add metrics for FSNamesystem read/write lock hold long time
> -----------------------------------------------------------
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
> Issue Type: Wish
> Components: hdfs
> Reporter: tomscut
> Assignee: tomscut
> Priority: Major
> Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]