jinjianming opened a new issue, #9946:
URL: https://github.com/apache/apisix/issues/9946

   ### Description
   
   I hope to have the best solution for optimizing ETCD and a Prometheus alarm 
strategy to ensure APISIX stability
         
   Just yesterday, The APISIX produced has crashed, and I believe the following 
two points should be taken seriously;
   
   1. Firstly, because etcd has reached the default storage 2G limit ,Causing 
upstream data to be unable to change, resulting in gateway unavailability;
   
   - I think it should be optimized 
[limit](https://etcd.io/docs/v3.3/dev-guide/limit/)&[auto-compaction](https://etcd.io/docs/v3.5/op-guide/maintenance/#auto-compaction
 ),I don't know if it's the best state;
   
   - Chart, please refer to here to add the configuration to handle the first 
issue (https://github.com/bitnami/charts/issues/8516);
   
   - If the limit has been exceeded, it is necessary to add the parameters and 
manually cancel the alarm (https://github.com/bitnami/charts/issues/18073) ;
   
   2. In terms of monitoring, I use 
`apisix_etcd_reachable{job="Produce-ApiSix"} == 0`It was found that the alarm 
could not be successful because if it hangs, this value will be null, lacking 
the best practice for alarm indicators.
   
   - can use this statement to solve the second problem 
`absent(apisix_etcd_reachable{job="Produce-ApiSix"}) == 1`
   
   
![image](https://github.com/apache/apisix/assets/57084209/93fffc8d-5ea4-4047-afae-3644183bdd10)
   
   In summary, I have used temporary solutions to solve it, and I hope to have 
a long-term stable solution to solve it.
   @moonming @tao12345666333
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@apisix.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to