monkeyDluffy6017 opened a new issue, #9015:
URL: https://github.com/apache/apisix/issues/9015

   ### Current Behavior
   
   Reload or service discovery will update the upstream object and rebuild the 
health checker if a request comes in.
   
![image](https://user-images.githubusercontent.com/9354193/223091150-078c9401-e367-4034-8751-cda7b3beb0f8.png)
   
https://github.com/apache/apisix/blob/69df734902782f6e12386dc505a40a5e64524154/apisix/upstream.lua#L102
   
   In the case of a large number of concurrent requests and a small number of 
upstreams, the following scenario exists.
   Requests a, b, and c all access the same upstream, and since there is an 
ngx.sleep call in healthcheck.new, requests a, b, and c may all reach position 
1, request a continues execution and successfully creates the checker, request 
b continues execution, and when it reaches position 2, since it corresponds to 
the same request b executes the cancel_clean_handler function, which sets the 
corresponding clean function to nil, and continues execution to position 3, 
where the ngx.sleep call is made inside the add_target function. Request c 
starts execution and when it reaches position 2, healthcheck_parent.checker is 
not nil and the cancel_clean_handler function is executed
   
![image](https://user-images.githubusercontent.com/9354193/223091442-46433f76-2ce4-4fe5-85e8-7ba237c7a201.png)
   
   At this point, the request returns 500 because the corresponding clean 
function has been set to nil by request b, and an error has occurred.
   
![image](https://user-images.githubusercontent.com/9354193/223091578-29e63301-315f-40cc-a097-c24e88c3cf92.png)
   
https://github.com/apache/apisix/blob/1acee1b687e17ade5452cdf78ad7379c3841f2b9/apisix/core/config_util.lua#L92
   
   The checker generated at location 1 cannot be released and a timed task is 
registered within the checker to continuously perform json decode
   
![image](https://user-images.githubusercontent.com/9354193/223091745-ddc1baed-5dd1-42a8-b0d0-c66244c5655c.png)
   
https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L217
   
   If the qps is large, thousands of checkers will be created that cannot be 
freed, causing CPU and memory anomalies
   
![image](https://user-images.githubusercontent.com/9354193/223091879-af39a8d3-93c5-4e31-ae6b-4a90aeae92a7.png)
   
   
   
   ### Expected Behavior
   
   The CPU and memory is normal after reload or service discovery
   
   ### Error Logs
   
   ```
   /usr/local/apisix/apisix/core/config_util.lua:79: attempt to call local 'f' 
(a nil value)
   config_util.lua:73: cancel_clean_handler(): item.clean_handlers is nil when 
cancel_clean_handler
   ```
   
   ### Steps to Reproduce
   
   1. One upstream with dozens of nodes
   2. High concurrency (4000+ qps)
   3. Active health check
   4. Reload
   
   ### Environment
   
   - APISIX version (run `apisix version`): 2.13.1
   - Operating system (run `uname -a`): centos 7.6
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): 1.19.3.1
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`):
   - APISIX Dashboard version, if relevant:
   - Plugin runner version, for issues related to plugin runners:
   - LuaRocks version, for installation issues (run `luarocks --version`):
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to