start1943 commented on issue #10422: URL: https://github.com/apache/apisix/issues/10422#issuecomment-2290781008
> > We also have similar issues. Currently, we have found that the pattern seems to be triggered with a very small probability when updating services in large quantities. The apisix ingress controller correctly updated the pod IP to etcd ,through curl XGET http://127.0.0.1:80/apisix/admin/upstream/xxx It is correct to check the IP list of Upstream, but there will still be a small number of requests made to offline IPs by Apisix until it is restarted to restore normal operations. Does Apisix have any memory caching mechanism? This problem has a serious impact。。。。 > > Hi @start1943 , > > While the problem occur, > > * What about the CPU/Memory usage and Connection stat. Any obvious exceptions? > * How long it usually last before restarting APISIX? > * What about the `retries & retry_timeout` config of the upstream. With default? > * And the quantity of nodes of the upstream? - Both cpu and memory、qps all fine, the problem does not occur when the apisix node has high cpu and memory, but occurs when a large number of updates in a short period of time to deployment triggers a change in pod ip - reload apisix is very fast,reload will trigger synchronization of the upstream node IP information in etcd, so hitting the abnormal offline node causes the 504 problem to recover - `retries & retry_timeout` config is default - All upstream 100 or so, each upstream inside the node 2-30 - The problem appears to be that the apisix node is not updating a memory cache properly,and is an occasional issue that may need to be triggered when a large number of pod updates are being tested -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
