[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lei Yang updated HDFS-17290: ---------------------------- Description: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. IPC server just emits a monolithic metrics for all the backoffs. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 was: Clients are backoff when rpcs cannot be enqueued. However there are different scenarios when backoff could happen. Currently there is no way to differenciate whether a backoff happened due to lowest prio+disconnection or queue overflow from higher ones. Example: # Client are directly enqueued into lowest priority queue and backoff when lowest queue is full. Client are expected to disconnect from namenode. # Client are enqueued into non-lowest priority queue and overflowed all the way down to lowest priority queue and back off. In this case, connection between client and namenode remains open. We would like to add metrics for #1 > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > ------------------------------------------------------------------------------------ > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.10.0, 3.4.0 > Reporter: Lei Yang > Assignee: Lei Yang > Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher ones. IPC server just emits a monolithic metrics > for all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org