[
https://issues.apache.org/jira/browse/HADOOP-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555834#comment-17555834
]
Viraj Jasani commented on HADOOP-18288:
---------------------------------------
[~aajisaka] are you fine with the change? If so, I can create backport PR for
branch-3.3, else [~tomscut] can help revert it on trunk.
My goal was to make this info available on Namenode ui so that it would be
straightforward to know which Datanodes are busier than usual without having to
explore more metrics. Same is the case with HBase, when user is alerted of
higher than usual traffic, HMaster ui itself would be sufficient to know which
Regionservers are busier than usual and take any action if required (e.g. run
balancer or move regions) before we even have to look at detailed metrics
(derived based on expressions on Prometheus or in-house built metric system).
But yes when more specific details require any attention (like more CPU usage
or Network errors etc), we anyways need to look at detailed metrics. This Jira
is about exposing overall business of servers such that they can be used on ui.
Moreover, we don't have these details on dev clusters also (e.g. pseudo
distributed mode or dockerized cluster) as majority dev would not have
Prometheus or any other metric systems deployed locally as well. Hence, from
that viewpoint also, total rps is basic to get some quick analysis of how busy
our servers are getting with some traffic we initiate for dev clusters.
> Total requests and total requests per sec served by RPC servers
> ---------------------------------------------------------------
>
> Key: HADOOP-18288
> URL: https://issues.apache.org/jira/browse/HADOOP-18288
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Viraj Jasani
> Assignee: Viraj Jasani
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h
> Remaining Estimate: 0h
>
> RPC Servers provide bunch of useful information like num of open connections,
> slow requests, num of in-progress handlers, RPC processing time, queue time
> etc, however so far it doesn't provide accumulation of all requests as well
> as current snapshot of requests per second served by the server. Exposing
> them would benefit from operational viewpoint in identifying how busy the
> servers have been and how much load they are currently serving in the
> presence of cluster wide high load.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]