xBis7 opened a new pull request, #3878:
URL: https://github.com/apache/ozone/pull/3878

   ## What changes were proposed in this pull request?
   
   On the Prometheus endpoint for the OM, in the DecayRpcScheduler summary for 
users, the username is exposed in the metric name. It makes almost impossible 
to monitor these values as every time a new user shows up we need to register a 
new metrics name. 
   
   The metric name from `org_apache_hadoop_ipc_decay_rpc_scheduler_call_volume` 
becomes `org_apache_hadoop_ipc_decay_rpc_scheduler_caller_hadoop_volume` for a 
user with `hadoop` username.
   
   The proposed solution is to remove the username from the metric and add it 
in a username tag. 
   
   This metric comes from `hadoop-common-3.3.4.jar/DecayRpcScheduler` and more 
specifically
   
   ```
   Metrics2Util.NameValuePair entry = 
(Metrics2Util.NameValuePair)topNCallers.poll();
   String topCaller = "Caller(" + entry.getName() + ")";
   String topCallerVolume = topCaller + ".Volume";
   String topCallerPriority = topCaller + ".Priority";
   rb.addCounter(Interns.info(topCallerVolume, topCallerVolume), 
entry.getValue());
   ``` 
   The name is in the format `Caller(username).MetricType` eg. 
`Caller(hadoop).Volume`. The cleanest way to deal with this seems to filter the 
metric in `PrometheusMetricsSink`.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7394
   
   ## How was this patch tested?
   
   This patch was tested manually, with a docker cluster and the OM `/prom` 
endpoint.
   
   To test it:
   
   in `compose/ozone` add in docker-config
   ```
   CORE-SITE.XML_ipc.9862.callqueue.impl=org.apache.hadoop.ipc.FairCallQueue
   CORE-SITE.XML_ipc.9862.scheduler.impl=org.apache.hadoop.ipc.DecayRpcScheduler
   CORE-SITE.XML_ipc.9862.scheduler.priority.levels=2
   CORE-SITE.XML_ipc.9862.backoff.enable=true
   CORE-SITE.XML_ipc.9862.faircallqueue.multiplexer.weights=99,1
   CORE-SITE.XML_ipc.9862.decay-scheduler.thresholds=90
   OZONE-SITE.XML_ozone.om.address=0.0.0.0:9862
   ```
   then
   
   ```
   $ export COMPOSE_FILE=docker-compose.yaml:monitoring.yaml
   $ docker-compose up --scale datanode=3 -d
   $ docker exec -it ozone_s3g_1 bash
   bash-4.2$ export AWS_ACCESS_KEY=test AWS_SECRET_KEY=pass
   bash-4.2$ ozone freon s3bg -t 1 -n 10
   ```
   on your browser go to `http://localhost:9874/prom` and you should see 
   
   ```
   # TYPE org_apache_hadoop_ipc_decay_rpc_scheduler_priority counter
   
org_apache_hadoop_ipc_decay_rpc_scheduler_priority{context="ipc.9862",hostname="e32f2e3bddb9",username="hadoop"}
 1
   ...
   ...
   ...
   # TYPE org_apache_hadoop_ipc_decay_rpc_scheduler_volume counter
   
org_apache_hadoop_ipc_decay_rpc_scheduler_volume{context="ipc.9862",hostname="e32f2e3bddb9",username="hadoop"}
 21
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to