zhoujiexiong commented on issue #10422:
URL: https://github.com/apache/apisix/issues/10422#issuecomment-2290813455

   > > > We also have similar issues. Currently, we have found that the pattern 
seems to be triggered with a very small probability when updating services in 
large quantities. The apisix ingress controller correctly updated the pod IP to 
etcd ,through curl XGET http://127.0.0.1:80/apisix/admin/upstream/xxx It is 
correct to check the IP list of Upstream, but there will still be a small 
number of requests made to offline IPs by Apisix until it is restarted to 
restore normal operations. Does Apisix have any memory caching mechanism? This 
problem has a serious impact。。。。
   > > 
   > > 
   > > Hi @start1943 ,
   > > While the problem occur,
   > > 
   > > * What about the CPU/Memory usage and Connection stat. Any obvious 
exceptions?
   > > * How long it usually last before restarting APISIX?
   > > * What about the `retries & retry_timeout` config of the upstream. With 
default?
   > > * And the quantity of nodes of the upstream?
   > 
   > * Both cpu and memory、qps all fine, the problem does not occur when the 
apisix node has high cpu and memory, but occurs when a large number of updates 
in a short period of time to deployment triggers a change in pod ip
   > * reload apisix is very fast,reload will trigger synchronization of the 
upstream node IP information in etcd, so hitting the abnormal offline node 
causes the 504 problem to recover
   > * `retries & retry_timeout` config  is default
   > * All upstream 100 or so, each upstream inside the node 2-30
   > * The problem appears to be that the apisix node is not updating a memory 
cache properly,and is an occasional issue that may need to be triggered when a 
large number of pod updates are being tested
   
   Hi @start1943 ,
   Is it possible to explicitly qualify the upstream 'retry_timeout & timeout' 
parameters in your environment, based on the connection 
conditions/characteristics of the business?
   See if the problem lasts for a shorter period of time after this setting and 
recovers in a shorter period of time without reload.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to