[GitHub] [apisix] wklken opened a new issue, #10000: bug: high cpu usage after prometheus.lua report 'no memory'

via GitHub Wed, 09 Aug 2023 20:18:10 -0700


wklken opened a new issue, #10000:
URL: https://github.com/apache/apisix/issues/10000


   ### Current Behavior
   
   ```yaml
     apisixConfig:
       luaSharedDict:
         .......
         prometheus_metrics: 200m
   ```
   
   we make `prometheus_metrics` shared dict `200m`, and after deployed online 
(1 deployment 8 pods) for about 7 days, the memory for each pod has been 
exhausted one by one. 
   
   we take a look at each pod which been killed because the cpu hit the 
resources limit
   
   ------
   
   from the grafana dashboard and the error log
   
   <img width="1631" alt="image" 
src="https://github.com/apache/apisix/assets/2002216/ac48a52a-7a82-4480-86ac-66866bcfe3a6";>
   
   
   when the metrics lost data, the error `no memory` present `use keyword: 
prometheus`
   
   <img width="600" alt="image" 
src="https://github.com/apache/apisix/assets/2002216/5add8f83-2b83-4fd6-b6d1-b2d2421eda7d";>
   
   ```
   [error] 76#76: *62508500 [lua] prometheus.lua:920: log_error(): Error while 
setting 'etcd_modify_indexes{key="x_etcd_index"}' to '98387': 'no memory', 
client: 1.1.1.1, server: , request: "GET /metrics HTTP/1.1", host: 
"0.0.0.0:6008"
   ```
   
   and after few hours, the apisix container will hit the cpu limits and been 
restarted.
   
   <img width="544" alt="image" 
src="https://github.com/apache/apisix/assets/2002216/796ef877-7a7f-41cc-950b-97605e17e56e";>
   
   ------
   
   before the container hit the high cpu limit, it has been report `no memory` 
for few hours.
   
   we have many other environments, and if the environment has restarts, we 
redeploy the apisix, and no restarts before the prometheus report `no memory`.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   ### Expected Behavior
   
   no hig cpu usage even the prometheus.lua report 'no memory'
   
   ### Error Logs
   
   [error] 76#76: *62508500 [lua] prometheus.lua:920: log_error(): Error while 
setting 'etcd_modify_indexes{key="x_etcd_index"}' to '98387': 'no memory', 
client: 1.1.1.1, server: , request: "GET /metrics HTTP/1.1", host: 
"0.0.0.0:6008"
   
   ### Steps to Reproduce
   
   1. set `prometheus_metrics` to a limited size
   2. deploy it with resources cpu limit
   3. add the `/metrics` as a prometheus target, Scrape Duration to 1 second
   
   
   ### Environment
   
   - APISIX version (run `apisix version`): 3.2.0
   - Operating system (run `uname -a`): 
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): 
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`): 3.5.4
   - APISIX Dashboard version, if relevant:
   - Plugin runner version, for issues related to plugin runners:
   - LuaRocks version, for installation issues (run `luarocks --version`):
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [apisix] wklken opened a new issue, #10000: bug: high cpu usage after prometheus.lua report 'no memory'

Reply via email to