[I] bug: lua_shared_dict prometheus-metrics overflow [apisix]

via GitHub Wed, 05 Feb 2025 02:32:07 -0800


DrJSAnD opened a new issue, #11948:
URL: https://github.com/apache/apisix/issues/11948


   ### Current Behavior
   
   We deploy apisix in K8s cluster and have problem with prometheus metrics.
   We noticed that lua_shared_dict prometheus-metrics overflows, then the 
number of apisix_nginx_metric_errors_total errors starts to grow and all 
metrics stop displaying correctly.
   
   
![Image](https://github.com/user-attachments/assets/440848ab-0da3-4a09-bc8b-65e651efa2e9)
   
   We try increase the prometheus-metrics parameter to 40m in the ConfigMap 
(config.yaml), but after 2 months this lua_shared_dict was full on all pods and 
errors started to occur again.
   
   
![Image](https://github.com/user-attachments/assets/4072e860-bdba-4eb4-86b5-6ea01f72989f)
   
   ```yaml
   nginx_config:    # config for render the template to genarate nginx.conf
     error_log: "/dev/stderr"
     error_log_level: "warn"    # warn,error
     worker_processes: "auto"
     enable_cpu_affinity: true
     worker_rlimit_nofile: 20480  # the number of files a worker process can 
open, should be larger than worker_connections
     event:
       worker_connections: 10620
     http:
       enable_access_log: true
       access_log: "/dev/stdout"
       access_log_format: '$remote_addr - $remote_user [$time_local] $http_host 
\"$request\" $status $body_bytes_sent $request_time \"$http_referer\" 
\"$http_user_agent\" $upstream_addr $upstream_status $upstream_response_time 
\"$upstream_scheme://$upstream_host$upstream_uri\"'
       access_log_format_escape: default
       keepalive_timeout: "60s"
       client_header_timeout: 60s     # timeout for reading client request 
header, then 408 (Request Time-out) error is returned to the client
       client_body_timeout: 60s       # timeout for reading client request 
body, then 408 (Request Time-out) error is returned to the client
       send_timeout: 10s              # timeout for transmitting a response to 
the client.then the connection is closed
       underscores_in_headers: "on"   # default enables the use of underscores 
in client request header fields
       real_ip_header: "X-Real-IP"    # 
http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
       real_ip_from:                  # 
http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
         - 127.0.0.1
         - 'unix:'
       lua_shared_dict:
         prometheus-metrics: 40m
   ```
   
   **Current Apisix state**
   - Deployment via Helm chart: https://github.com/apache/apisix-helm-chart
   - Helm Chart version: 2.10.0
   - K8s pods: 3
   - Pod CPU limits: 15 (usage 4%)
   - Pod Memory limits: 60Gb (usage 35 GiB)
   - Total requests per second: 2500 - 3000
   - Active connections: 2000+
   - Upstreams: 100+
   - Routes: 120+
   - Consumers: 60+
   - Plugins: basic-auth and kafka-logger on all routes
   
   
   ### Expected Behavior
   
   _No response_
   
   ### Error Logs
   
   _No response_
   
   ### Steps to Reproduce
   
   1. Run apisix with default lua_shared_dict: prometheus-metrics
   2. After 2-3 weeks prometheus-metrics overflows and 
apisix_nginx_metric_errors_total errors starts to grow and all metrics stop 
displaying correctly
   3. Change lua_shared_dict: prometheus-metrics to 40m
   4. After 2-3 months lua_shared_dict overflows again and we get a similar 
problem with displaying metrics
   
   ### Environment
   
   - APISIX version (run `apisix version`): 3.10.0
   - Operating system (run `uname -a`): Linux apisix-69cfdc5fbf-m7k27 
5.14.0-362.13.1.el9_3.x86_64 SMP PREEMPT_DYNAMIC Fri Nov 24 01:57:57 EST 2023 
x86_64 GNU/Linux
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): 
openresty/1.25.3.2
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`): 3.5.0
   - APISIX Dashboard version, if relevant: 3.0.0
   - Plugin runner version, for issues related to plugin runners:
   - LuaRocks version, for installation issues (run `luarocks --version`):


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] bug: lua_shared_dict prometheus-metrics overflow [apisix]

Reply via email to