[ 
https://issues.apache.org/jira/browse/HBASE-15242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687300#comment-17687300
 ] 

Victor Li commented on HBASE-15242:
-----------------------------------

The retry and timeout counts metrics that we want to make are for each type of 
rpc calls in addition to the overall of the type of rpc count and failure count.

The count for timed out calls can be determined by examine the exception if the 
call results in it in the callback after the call is done, we can consider the 
following exceptions as the call being timed out:
SocketTimeoutException
TimeoutException
CallTimeoutException
The count for rpc retry is not straightforward to get as we don't have the 
context at the time when metrics are collected. We can make the indication for 
a call is being retried or being the first time. The changes will involve a lot 
of places and seems risky.

As we have the overall count of rpc calls for each type, and overall count for 
failed rpc calls of each type, in addition to the timed out failure call, the 
count for retried calls of each type seems a bit redundant or not so valuable.

I am thinking to use this Jira to implement the counter metrics for timed out 
rpc calls of each type. Please share your comments if you have any.

> Client metrics for retries and timeouts
> ---------------------------------------
>
>                 Key: HBASE-15242
>                 URL: https://issues.apache.org/jira/browse/HBASE-15242
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Mikhail Antonov
>            Assignee: Victor Li
>            Priority: Major
>
> Client metrics to see total/avg number or retries, retries exhaused and 
> timeouts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to