[GitHub] [apisix] ryan4yin opened a new issue, #7934: help request: The increase of QPS leads to 504 timeout errors and etcd timeout errors

GitBox Fri, 16 Sep 2022 05:25:30 -0700


ryan4yin opened a new issue, #7934:
URL: https://github.com/apache/apisix/issues/7934


   ### Description
   
   After increasing the QPS rate o APISIX,  some new error messages were 
observed：
   
   - `(upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT 
increment (1/3) for 'xxx.test.com(192.168.3.233:8106)'`
   - `config_etcd.lua:568: no healthy etcd endpoint available, next retry after 
4s`
   
   and when I decrease the QPS rate to lower than 500, the problem just 
disappeared.
   
   I've increased my upstream `keepalive_pools`, `worker_rlimit_nofile` 
`worker_connections` and restart all the APISIX pods, but the problem still 
exists.
   
   related issues:
   
   - https://github.com/apache/apisix/issues/7162
   
   the prometheus metrics:
   <img width="1611" alt="image" 
src="https://user-images.githubusercontent.com/22363274/190634978-2f66629e-8cb4-417d-9eb6-ef8d4772893a.png";>
   <img width="1618" alt="image" 
src="https://user-images.githubusercontent.com/22363274/190635134-47105fe3-5bd9-4aa8-8d42-748e4b60c86d.png";>
   
   
   the error logs:
   
   ```
   2022/09/16 11:35:26 [warn] 44#44: *1868877 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while 
logging request, client: 110.227.0.30, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:26 [warn] 44#44: *1879978 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/3) for 'xxx.test.com(192.168.3.158:8106)' while 
logging request, client: 45.7.214.7, server: _, request: "POST /xxx HTTP/1.1", 
upstream: "http://192.168.3.158:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:26 [error] 45#45: *1868931 upstream timed out (110: 
Operation timed out) while sending to client, client: 201.42.52.64, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.158:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:26 [warn] 45#45: *1868931 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.158:8106)' while 
logging request, client: 201.42.52.64, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.158:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:29 [warn] 46#46: *1765790 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:35:29 [warn] 46#46: *1765789 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.7:8106)', context: 
ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.69:8106)', context: 
ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.5:8106)', context: 
ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.188:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.184:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.254:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:32 [warn] 44#44: *1822405 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:35:33 [warn] 46#46: *1765785 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:33 [warn] 46#46: *1766229 [lua] health_check.lua:105: 
report_failure(): update endpoint: 
http://apisix-etcd.test.svc.cluster.local:2379 to unhealthy, context: ngx.timer
   2022/09/16 11:35:33 [warn] 46#46: *1766229 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:34 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:34 [error] 45#45: *1889867 upstream timed out (110: 
Operation timed out) while sending to client, client: 181.39.123.98, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, host: 
"xxx.test.com"
   2022/09/16 11:35:34 [warn] 45#45: *1889867 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.5:8106)' while 
logging request, client: 181.39.123.98, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:34 [error] 45#45: *1889641 upstream timed out (110: 
Operation timed out) while sending to client, client: 37.236.170.22, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, host: 
"xxx.test.com"
   2022/09/16 11:35:34 [warn] 45#45: *1889641 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (2/3) for 'xxx.test.com(192.168.3.5:8106)' while 
logging request, client: 37.236.170.22, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:34 [error] 45#45: *1853394 upstream timed out (110: 
Operation timed out) while sending to client, client: 181.124.183.134, server: 
_, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:34 [warn] 45#45: *1853394 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (3/3) for 'xxx.test.com(192.168.3.5:8106)' while 
logging request, client: 181.124.183.134, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:34 [error] 45#45: *1882612 upstream timed out (110: 
Operation timed out) while sending to client, client: 187.24.133.50, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.153:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:34 [warn] 45#45: *1882612 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.153:8106)' while 
logging request, client: 187.24.133.50, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.153:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:35 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.184:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:35 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.188:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:35 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.5:8106)', context: 
ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:36 [error] 46#46: *1806529 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:36 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:36 [warn] 45#45: *1764312 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:36 [warn] 45#45: *1715338 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:36 [warn] 45#45: *1794012 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:36 [warn] 45#45: *1764646 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:35:36 [warn] 46#46: *1765792 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:35:37 [error] 45#45: *1764312 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:37 [error] 45#45: *1715338 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:37 [error] 45#45: *1794012 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:37 [error] 45#45: *1764646 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.7:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.102:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.158:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.150:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:35:37 [warn] 45#45: *1887001 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.94:8106)' while 
logging request, client: 31.4.158.18, server: _, request: "POST /xxx HTTP/1.1", 
upstream: "http://192.168.3.94:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:37 [error] 46#46: *1765792 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:38 [error] 46#46: *1806529 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.69:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.254:8106)', 
context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080
   2022/09/16 11:35:38 [error] 45#45: *1888945 upstream timed out (110: 
Operation timed out) while sending to client, client: 181.176.116.242, server: 
_, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 45#45: *1888945 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 181.176.116.242, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 45#45: *1893023 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (3/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 87.201.177.125, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [error] 44#44: *1892211 upstream timed out (110: 
Operation timed out) while sending to client, client: 200.68.170.52, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 44#44: *1892211 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 200.68.170.52, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [error] 44#44: *1868489 upstream timed out (110: 
Operation timed out) while sending to client, client: 200.192.102.128, server: 
_, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 44#44: *1868489 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 200.192.102.128, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [error] 44#44: *1892267 upstream timed out (110: 
Operation timed out) while sending to client, client: 36.71.142.253, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 44#44: *1892267 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 36.71.142.253, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [error] 44#44: *1892418 upstream timed out (110: 
Operation timed out) while sending to client, client: 103.183.35.18, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 44#44: *1892418 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (2/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 103.183.35.18, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:38 [error] 44#44: *1892467 upstream timed out (110: 
Operation timed out) while sending to client, client: 106.51.2.239, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:38 [warn] 44#44: *1892467 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while 
logging request, client: 106.51.2.239, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:39 [error] 45#45: *1764312 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:39 [error] 45#45: *1715338 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:39 [error] 45#45: *1794012 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:39 [error] 45#45: *1764646 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:39 [warn] 45#45: *1721285 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:35:39 [error] 46#46: *1765792 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 4s, context: ngx.timer
   2022/09/16 11:35:39 [error] 44#44: *1892860 upstream timed out (110: 
Operation timed out) while sending to client, client: 102.78.21.173, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.184:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:39 [warn] 44#44: *1892860 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.184:8106)' while 
logging request, client: 102.78.21.173, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.184:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:39 [error] 45#45: *1834318 upstream timed out (110: 
Operation timed out) while sending to client, client: 49.35.250.193, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.141:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:35:39 [warn] 45#45: *1834318 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.141:8106)' while 
logging request, client: 49.35.250.193, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.141:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:35:40 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 8s, context: ngx.timer
   2022/09/16 11:35:40 [error] 45#45: *1721285 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   2022/09/16 11:35:40 [error] 46#46: *1765794 [lua] config_etcd.lua:568: no 
healthy etcd endpoint available, next retry after 2s, context: ngx.timer
   ...skip many logs...
   2022/09/16 11:36:34 [error] 45#45: *1917917 upstream timed out (110: 
Operation timed out) while sending to client, client: 102.51.1.98, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.150:8106//xxx";, 
host: "xxx.test.com"
   2022/09/16 11:36:34 [warn] 45#45: *1917917 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.150:8106)' while 
logging request, client: 102.51.1.98, server: _, request: "POST /xxx HTTP/1.1", 
upstream: "http://192.168.3.150:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:36:36 [warn] 44#44: *1917486 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.150:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:36 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.150:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.141:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.69:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.153:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:39 [warn] 45#45: *1887395 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.69:8106)' while 
logging request, client: 187.25.53.91, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.69:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:36:39 [warn] 44#44: *1903779 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.94:8106)' while 
logging request, client: 200.68.170.67, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:36:39 [error] 45#45: *1913636 upstream timed out (110: 
Operation timed out) while sending to client, client: 37.236.60.18, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: 
"xxx.test.com"
   2022/09/16 11:36:39 [warn] 45#45: *1913636 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while 
logging request, client: 37.236.60.18, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:36:43 [warn] 45#45: *1803785 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   2022/09/16 11:36:43 [warn] 45#45: *1715338 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:36:43 [warn] 45#45: *1721285 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:36:43 [warn] 45#45: *1794012 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.235:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.141:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.102:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', 
context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080
   2022/09/16 11:36:44 [error] 44#44: *1892860 upstream timed out (110: 
Operation timed out) while sending to client, client: 102.78.21.173, server: _, 
request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: 
"xxx.test.com"
   2022/09/16 11:36:44 [warn] 44#44: *1892860 [lua] healthcheck.lua:1107: 
log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) 
unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while 
logging request, client: 102.78.21.173, server: _, request: "POST /xxx 
HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx";, host: "xxx.test.com"
   2022/09/16 11:36:47 [warn] 46#46: *1765794 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:36:50 [warn] 45#45: *1764312 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:36:58 [warn] 45#45: *1764646 [lua] config_etcd.lua:563: 
reconnected to etcd, context: ngx.timer
   2022/09/16 11:37:05 [warn] 44#44: *1784320 [lua] v3.lua:692: 
request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. 
Retrying, context: ngx.timer
   ```
   
   
   ### Environment
   
   
   the kubernetes deployment manifest I used:
   
   ```yaml
   ---
   apiVersion: policy/v1beta1
   kind: PodDisruptionBudget
   metadata:
     name: apisix-edge
     namespace: test
     labels:
       helm.sh/chart: apisix-0.11.0
       app.kubernetes.io/name: apisix
       app.kubernetes.io/instance: apisix-edge
       app.kubernetes.io/version: "2.15"
       app.kubernetes.io/managed-by: Helm
   spec:
     minAvailable: 90%
     selector:
       matchLabels:
         app.kubernetes.io/name: apisix
         app.kubernetes.io/instance: apisix-edge
   ---
   apiVersion: v1
   kind: ConfigMap
   metadata:
     name: apisix-edge
     namespace: test
   data:
     config.yaml: |-
       apisix:
         node_listen: 8080             # APISIX listening port
   
       nginx_config:                     # config for render the template to 
genarate nginx.conf
         error_log: "/dev/stderr"
         error_log_level: "warn"         # warn,error
         worker_processes: "auto"
         enable_cpu_affinity: true
   
         worker_rlimit_nofile: 300000     # the number of files a worker 
process can open, should be larger than worker_connections
         event:
           worker_connections: 63000
   
         worker_shutdown_timeout: 240s   # timeout for a graceful shutdown of 
worker processes
         max_pending_timers: 16384       # increase it if you see "too many 
pending timers" error
         max_running_timers: 4096        # increase it if you see 
"lua_max_running_timers are not enough" error
   
         http:
           enable_access_log: false
           keepalive_timeout: 60s         # timeout during which a keep-alive 
client connection will stay open on the server side.
           client_header_timeout: 60s     # timeout for reading client request 
header, then 408 (Request Time-out) error is returned to the client
           client_body_timeout: 60s       # timeout for reading client request 
body, then 408 (Request Time-out) error is returned to the client
           send_timeout: 30s              # timeout for transmitting a response 
to the client. then the connection is closed
           underscores_in_headers: "on"   # default enables the use of 
underscores in client request header fields
           real_ip_header: "X-Real-IP"    # 
http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header
           real_ip_from:                  # 
http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from
             - 127.0.0.1
             - 'unix:'
   
       etcd:
         host:                                 # it's possible to define 
multiple etcd hosts addresses of the same etcd cluster.
           - "http://apisix-etcd.test.svc.cluster.local:2379";             # 
multiple etcd address
         prefix: "/apisix-edge-gateway"     # apisix configurations prefix
         timeout: 30   # 30 seconds
         user: "root"
         password: "xxx"
       plugins:                          # plugin list
         - kafka-logger
         - prometheus
         - proxy-rewrite
         - proxy-mirror
         - gzip
         - real-ip
       stream_plugins:
         - limit-conn
       
       plugin_attr:
         prometheus:
           export_addr:
             ip: 0.0.0.0
             port: 9091
           export_uri: /apisix/prometheus/metrics
           metric_prefix: apisix_
         
         # 下调 proxy-mirror 的超时时间，避免请求等待镜像流量的响应，造成大量请求超时！
         proxy-mirror:
           timeout:                          # proxy timeout in mirrored 
sub-request
             connect: 200ms
             read: 200ms
             send: 200ms
   ---
   apiVersion: v1
   kind: Service
   metadata:
     name: apisix-edge-gateway
     namespace: test
     labels:
       helm.sh/chart: apisix-0.11.0
       app.kubernetes.io/name: apisix
       app.kubernetes.io/instance: apisix-edge
       app.kubernetes.io/version: "2.15"
       app.kubernetes.io/managed-by: Helm
       app.kubernetes.io/service: apisix-gateway
   spec:
     type: NodePort
     externalTrafficPolicy: Cluster
     ports:
     - name: apisix-gateway
       port: 80
       targetPort: 8080
       protocol: TCP
     selector:
       app.kubernetes.io/name: apisix
       app.kubernetes.io/instance: apisix-edge
   ---
   apiVersion: apps/v1
   kind: Deployment
   metadata:
     name: apisix-edge
     namespace: test
     labels:
       helm.sh/chart: apisix-0.11.0
       app.kubernetes.io/name: apisix
       app.kubernetes.io/instance: apisix-edge
       app.kubernetes.io/version: "2.15"
       app.kubernetes.io/managed-by: Helm
   spec:
     replicas: 3
     strategy:
       type: RollingUpdate
       rollingUpdate:
         maxSurge: 1
         maxUnavailable: 0
     selector:
       matchLabels:
         app.kubernetes.io/name: apisix
         app.kubernetes.io/instance: apisix-edge
     template:
       metadata:
         annotations:
           prometheus.io/path: /apisix/prometheus/metrics
           prometheus.io/port: "9091"
           prometheus.io/scrape: "true"
         labels:
           app.kubernetes.io/name: apisix
           app.kubernetes.io/instance: apisix-edge
       spec:
         terminationGracePeriodSeconds: 360
         securityContext:
           null
         containers:
           - name: apisix
             securityContext:
               null
             image: "apache/apisix:2.15-alpine"
             imagePullPolicy: IfNotPresent
             ports:
               - name: http
                 containerPort: 8080
                 protocol: TCP
               - name: tls
                 containerPort: 9443
                 protocol: TCP
               - name: admin
                 containerPort: 9180
                 protocol: TCP
               - name: prometheus
                 containerPort: 9091
                 protocol: TCP
             readinessProbe:
               failureThreshold: 6
               initialDelaySeconds: 10
               periodSeconds: 10
               successThreshold: 1
               tcpSocket:
                 port: 8080
               timeoutSeconds: 1
             lifecycle:
               preStop:
                 exec:
                   command:
                   - /bin/sh
                   - -c
                   - "sleep 300"
             volumeMounts:
               - mountPath: /usr/local/apisix/conf/config.yaml
                 name: apisix-config
                 subPath: config.yaml
             resources:
               limits:
                 cpu: 1900m
                 memory: 3000Mi
               requests:
                 cpu: 1600m
                 memory: 2600Mi
         volumes:
           - configMap:
               name: apisix-edge
             name: apisix-config
         nodeSelector:
           eks.amazonaws.com/nodegroup: test-edge-gateway
         affinity:
           nodeAffinity:
             requiredDuringSchedulingIgnoredDuringExecution:
               nodeSelectorTerms:
               - matchExpressions:
                 - key: kubernetes.io/arch
                   operator: In
                   values:
                   - amd64
           podAntiAffinity:
             preferredDuringSchedulingIgnoredDuringExecution:
             - podAffinityTerm:
                 labelSelector:
                   matchExpressions:
                   - key: app.kubernetes.io/name
                     operator: In
                     values:
                     - apisix
                 topologyKey: kubernetes.io/hostname
               weight: 100
         tolerations:
           - effect: NoSchedule
             key: platform.xxx/test-edge-gateway
             operator: Exists
   ```
   
   the APISIX upstream config:
   
   ```json
   {
       "id": "xxx-extractor",
       "desc": "xxx-extractor",
       "scheme": "http",
       "pass_host": "pass",
       "type":"roundrobin",
       "keepalive_pool": {
           "size": 1000,
           "idle_timeout": 75,
           "requests": 1000
       },
       "nodes": [
           { "host": "192.168.3.235", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.233", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.141", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.129", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.7", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.102", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.69", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.5", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.153", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.184", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.158", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.94", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.254", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.188", "port": 8080, "weight": 0, "priority": 0},
           { "host": "192.168.3.150", "port": 8080, "weight": 0, "priority": 0}
       ],
       "retries": 0,
       "timeout": {
           "connect": 3,
           "send": 3,
           "read": 3
       },
       "checks": {
           "active": {
               "timeout": 3,
               "http_path": "/",
               "healthy": {
                   "interval": 5,
                   "successes": 2
               },
               "unhealthy": {
                   "interval": 5,
                   "http_failures": 3
               },
               "req_headers": ["User-Agent: curl/7.29.0"]
           },
           "passive": {
               "healthy": {
                   "http_statuses": [200, 201],
                   "successes": 3
               },
               "unhealthy": {
                   "http_statuses": [500, 502, 503, 504],
                   "http_failures": 3,
                   "tcp_failures": 3
               }
           }
       }
   }
   ```
   
   and the APISIX route configs:
   
   ```json
   {
     "id": "xxx-ingress",
     "uri": "/*",
     "hosts": [
       "xxx.test.com"
     ],
     "methods": [
       "PUT",
       "GET",
       "POST",
       "HEAD"
     ],
     "plugins": {
       "prometheus": {}
     },
     "upstream_id": "xxx-extractor"
   }
   ```
   
   running in Kubernetes, using helm chart `apisix/apisix` with version 
`0.11.0`.
   
   - APISIX version (run `apisix version`): 
`docker.io/apache/apisix:2.15.0-alpine`
   - Operating system (run `uname -a`): `Linux apisix-edge-765f88c49f-8tslz 
5.4.190-107.353.amzn2.x86_64 #1 SMP Wed Apr 27 21:16:35 UTC 2022 x86_64 Linux`
   - etcd version: `docker.io/bitnami/etcd:3.5.4-debian-11-r22`
   - grafana dashbaord I use: 
<https://grafana.com/grafana/dashboards/11719-apache-apisix/>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [apisix] ryan4yin opened a new issue, #7934: help request: The increase of QPS leads to 504 timeout errors and etcd timeout errors

Reply via email to