morhidi opened a new pull request, #310:
URL: https://github.com/apache/flink-kubernetes-operator/pull/310
## What is the purpose of the change
This pull request adds metrics and KPIs related to Kubernetes API server
access. Metrics can be enabled by
`kubernetes.operator.kubernetes.client.metrics.enabled` (defaults to `true`).
## Brief change log
- added various request/response counters
```
-- Counters
-------------------------------------------------------------------
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.Count:
94
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.POST.Count:
6
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.PATCH.Count:
10
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.DELETE.Count:
4
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.PUT.Count:
8
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.GET.Count:
66
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.Failed.Count:
3
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.Count:
91
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.101.Count:
5
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.409.Count:
1
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.201.Count:
6
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.404.Count:
10
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.200.Count:
69
```
- added key request/response KPIs:
```
-- Meters
---------------------------------------------------------------------
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpRequest.NumPerSecond:
0.08333333333333333
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.NumPerSecond:
0.03333333333333333
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.Failed.NumPerSecond:
0.05
```
```
-- Histograms
-----------------------------------------------------------------
localhost.k8soperator.default.flink-kubernetes-operator.KubeClient.HttpResponse.LatencyNanos:
count=91, min=2588875, max=273916959, mean=1.8684283417582415E7,
stddev=4.088778006829815E7, p50=7575458.0, p75=1.3146208E7, p95=5.92533498E7,
p98=2.7390890844E8, p99=2.73916959E8, p999=2.73916959E8
```
## Verifying this change
This change added tests that covers the functionality and can be verified as
follows:
Manually by enabling the `Slf4jReporterFactory` that dumps the metrics into
the logs:
```
kubernetes.operator.metrics.reporter.slf4j.factory.class:
org.apache.flink.metrics.slf4j.Slf4jReporterFactory
kubernetes.operator.metrics.reporter.slf4j.interval: 10 SECONDS
```
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changes to the `CustomResourceDescriptors`:
no
- Core observer or reconciler logic that is regularly executed: no
## Documentation
- Does this pull request introduce a new feature? (yes)
- If yes, how is the feature documented?
- docs for `kubernetes.operator.kubernetes.client.metrics.enabled`
property is autogenerated
- Metrics descriptions are added to the documentation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]