FeiYing9 opened a new issue, #11934:
URL: https://github.com/apache/apisix/issues/11934
### Current Behavior
there are several k8s clusters running with apisix, but just one cluster
with the problem that apisix report lots of duplicate metrics.
for example:
```bash
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/cluster_metric/list_task_dimension",method="POST"}
96
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/cluster_metric/list_task_dimension",method="POST"}
96
...
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/file/upload",method="POST"}
3188
apisix_http_status{code="200",route="3fb9d6c2",matched_uri="/api/v1/*",matched_host="xxx",service="",consumer="",node="10.244.19.254",host="xxx",upstream_addr="10.244.19.254:8080",upstream_status="200",uri="/api/v1/file/upload",method="POST"}
3188
```
so we will see lots of error logs from prometheus:
```bash
ts=2025-01-22T08:51:08.867Z caller=scrape.go:1793 level=debug
component="scrape manager" scrape_pool=serviceMonitor/apisix/apisix/0
target=http://10.244.5.32:9091/apisix/prometheus/metrics msg="Duplicate sample
for timestamp"
series="apisix_http_latency_bucket{type=\"apisix\",route=\"3fb9d6c2\",service=\"\",consumer=\"\",node=\"10.244.10.60\",host=\"xxx\",upstream_addr=\"10.244.10.60:8080\",upstream_status=\"200\",uri=\"/api/v1/user/routes/ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0\",method=\"GET\",le=\"100\"}"
```
### Expected Behavior
_No response_
### Error Logs
all error logs is about the shdict:
```bash
2025/01/22 15:07:27 [error] 534#534: *2088505577 [lua]
prometheus_resty_counter.lua:39: increasing counter in shdict: lru eviction:
key=http_latency_bucket{type="request",route="3fb9d6c2",service="",consumer="",node="10.244.11.36",host="xxx",upstream_addr="10.244.11.36:8080",upstream_status="200",uri="/api/v1/notebook/7eb9852a-be8d-4fac-a593-31f5f7d864b0",method="GET",le="30000.0"},
context: ngx.timer
...
2025/01/22 16:53:00 [error] 499#499: *2098016584 [lua] prometheus.lua:973:
log_error(): Shared dictionary used for prometheus metrics is full. REPORTED
METRIC DATA MIGHT BE INCOMPLETE. Please increase the size of the dictionary or
decrease metric cardinality.; key index: add key: idx=__ngx_prom__key_115158,
key=http_latency_bucket{type="request",route="3fb9d6c2",service="",consumer="",node="10.244.11.36",host="xxx",upstream_addr="10.244.11.36:8080",upstream_status="200",uri="/api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727",method="GET",le="Inf"}
while logging request, client: 10.245.13.201, server: _, request: "GET
/api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727 HTTP/2.0",
upstream:
"http://10.244.11.36:8080/api/v1/project/project-cc83c686-1515-454e-870b-202a20a67727",
host: "qz.sii.edu.cn", referrer:
"https://xxx/jobs/distributedTraining?spaceId=ws-f4d69b29-e0a5-44e6-bd92-acf4de9990f0"
```
We accept the issue of insufficient shared dict memory, just hope to know
why apisix report duplicate metrics.
### Steps to Reproduce
no ideas
apisix config:
```bash
nginx_config: # config for render the template to genarate nginx.conf
lua_shared_dict:
prometheus-metrics: 200m # yes, it's 200m
...
plugin_attr:
opentelemetry:
set_ngx_var: true
prometheus:
expire: 16
export_addr:
ip: 0.0.0.0
port: 9091
export_uri: /apisix/prometheus/metrics
metric_prefix: apisix_
metrics:
bandwidth:
extra_labels:
- host: $host
- upstream_addr: $upstream_addr
- upstream_status: $upstream_status
- uri: $uri
- method: $request_method
http_latency:
extra_labels:
- host: $host
- upstream_addr: $upstream_addr
- upstream_status: $upstream_status
- uri: $uri
- method: $request_method
http_status:
extra_labels:
- host: $host
- upstream_addr: $upstream_addr
- upstream_status: $upstream_status
- uri: $uri
- method: $request_method
prefer_name: true
```
### Environment
- APISIX version (run `apisix version`): `3.7.0 (helm version: 2.5.0)`
- Operating system (run `uname -a`): `Linux cpu-001 5.4.0-192-generic
#212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux`
- OpenResty / Nginx version (run `openresty -V` or `nginx -V`):
`openresty/1.21.4.2`
- k8s version:
```bash
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.16",
GitCommit:"cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a", GitTreeState:"clean",
BuildDate:"2024-07-17T01:53:56Z", GoVersion:"go1.22.5", Compiler:"gc",
Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.16",
GitCommit:"cbb86e0d7f4a049666fac0551e8b02ef3d6c3d9a", GitTreeState:"clean",
BuildDate:"2024-07-17T01:44:26Z", GoVersion:"go1.22.5", Compiler:"gc",
Platform:"linux/amd64"}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]