[
https://issues.apache.org/jira/browse/HDFS-17042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17731075#comment-17731075
]
ASF GitHub Bot commented on HDFS-17042:
---------------------------------------
xinglin opened a new pull request, #5730:
URL: https://github.com/apache/hadoop/pull/5730
### Description of PR
Add two new types of metrics to the existing NN
RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of
SLA/SLO for the HDFS service.
- RpcCallSuccesses: it measures the number of RPC requests where they are
successfully processed by a NN (e.g., with a response with an RpcStatus
RpcStatusProto.SUCCESS). Then, together with RpcQueueNumOps (which refers the
total number of RPC requests), we can derive the RpcErrorRate for our NN, as
(RpcQueueNumOps - RpcCallSuccesses) / RpcQueueNumOps.
- OverallRpcProcessingTime for each RPC method: this metric measures the
overall RPC processing time for each RPC method at the NN. It covers the time
from when a request arrives at the NN to when a response is sent back. We are
already emitting processingTime for each RPC method today in
RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for
each RPC method, which includes enqueueTime, queueTime, processingTime,
responseTime, and handlerTime.
### How was this patch tested?
```
mvn test -Dtest="TestRPC#testOverallRpcProcessingTimeMetric"
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.ipc.TestRPC
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.014
s - in org.apache.hadoop.ipc.TestRPC
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
mvn test -Dtest="TestRPC#testRpcCallSuccessesMetric"
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.ipc.TestRPC
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.004
s - in org.apache.hadoop.ipc.TestRPC
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
```
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode
> ----------------------------------------------------------------------------
>
> Key: HDFS-17042
> URL: https://issues.apache.org/jira/browse/HDFS-17042
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 3.4.0, 3.3.9
> Reporter: Xing Lin
> Assignee: Xing Lin
> Priority: Major
>
> We'd like to add two new types of metrics to the existing NN
> RpcMetrics/RpcDetailedMetrics. These two metrics can then be used as part of
> SLA/SLO for the HDFS service.
> * {_}RpcCallSuccesses{_}: it measures the number of RPC requests where they
> are successfully processed by a NN (e.g., with a response with an RpcStatus
> {_}RpcStatusProto.SUCCESS){_}{_}.{_} Then, together with {_}RpcQueueNumOps
> ({_}which refers the total number of RPC requests{_}){_}, we can derive the
> RpcErrorRate for our NN, as (RpcQueueNumOps - RpcCallSuccesses) /
> RpcQueueNumOps.
> * OverallRpcProcessingTime for each RPC method: this metric measures the
> overall RPC processing time for each RPC method at the NN. It covers the time
> from when a request arrives at the NN to when a response is sent back. We are
> already emitting processingTime for each RPC method today in
> RpcDetailedMetrics. We want to extend it to emit overallRpcProcessingTime for
> each RPC method, which includes enqueueTime, queueTime, processingTime,
> responseTime, and handlerTime.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]