ryan4yin opened a new issue, #7934: URL: https://github.com/apache/apisix/issues/7934
### Description After increasing the QPS rate o APISIX, some new error messages were observed: - `(upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.233:8106)'` - `config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s` and when I decrease the QPS rate to lower than 500, the problem just disappeared. I've increased my upstream `keepalive_pools`, `worker_rlimit_nofile` `worker_connections` and restart all the APISIX pods, but the problem still exists. related issues: - https://github.com/apache/apisix/issues/7162 the prometheus metrics: <img width="1611" alt="image" src="https://user-images.githubusercontent.com/22363274/190634978-2f66629e-8cb4-417d-9eb6-ef8d4772893a.png"> <img width="1618" alt="image" src="https://user-images.githubusercontent.com/22363274/190635134-47105fe3-5bd9-4aa8-8d42-748e4b60c86d.png"> the error logs: ``` 2022/09/16 11:35:26 [warn] 44#44: *1868877 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while logging request, client: 110.227.0.30, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:26 [warn] 44#44: *1879978 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/3) for 'xxx.test.com(192.168.3.158:8106)' while logging request, client: 45.7.214.7, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.158:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:26 [error] 45#45: *1868931 upstream timed out (110: Operation timed out) while sending to client, client: 201.42.52.64, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.158:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:26 [warn] 45#45: *1868931 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.158:8106)' while logging request, client: 201.42.52.64, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.158:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:29 [warn] 46#46: *1765790 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:35:29 [warn] 46#46: *1765789 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.7:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.69:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.5:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.188:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.184:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:30 [warn] 45#45: *1887896 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.254:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:32 [warn] 44#44: *1822405 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:35:33 [warn] 46#46: *1765785 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:33 [warn] 46#46: *1766229 [lua] health_check.lua:105: report_failure(): update endpoint: http://apisix-etcd.test.svc.cluster.local:2379 to unhealthy, context: ngx.timer 2022/09/16 11:35:33 [warn] 46#46: *1766229 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:34 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:34 [error] 45#45: *1889867 upstream timed out (110: Operation timed out) while sending to client, client: 181.39.123.98, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [warn] 45#45: *1889867 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.5:8106)' while logging request, client: 181.39.123.98, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [error] 45#45: *1889641 upstream timed out (110: Operation timed out) while sending to client, client: 37.236.170.22, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [warn] 45#45: *1889641 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (2/3) for 'xxx.test.com(192.168.3.5:8106)' while logging request, client: 37.236.170.22, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [error] 45#45: *1853394 upstream timed out (110: Operation timed out) while sending to client, client: 181.124.183.134, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [warn] 45#45: *1853394 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (3/3) for 'xxx.test.com(192.168.3.5:8106)' while logging request, client: 181.124.183.134, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.5:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [error] 45#45: *1882612 upstream timed out (110: Operation timed out) while sending to client, client: 187.24.133.50, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.153:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:34 [warn] 45#45: *1882612 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.153:8106)' while logging request, client: 187.24.133.50, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.153:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:35 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.184:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:35 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.188:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:35 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/2) for 'xxx.test.com(192.168.3.5:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:36 [error] 46#46: *1806529 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:36 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:36 [warn] 45#45: *1764312 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:36 [warn] 45#45: *1715338 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:36 [warn] 45#45: *1794012 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:36 [warn] 45#45: *1764646 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:35:36 [warn] 46#46: *1765792 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:35:37 [error] 45#45: *1764312 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:37 [error] 45#45: *1715338 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:37 [error] 45#45: *1794012 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:37 [error] 45#45: *1764646 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.7:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.102:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.158:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:37 [warn] 44#44: *1891665 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.150:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:35:37 [warn] 45#45: *1887001 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.94:8106)' while logging request, client: 31.4.158.18, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:37 [error] 46#46: *1765792 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:38 [error] 46#46: *1806529 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.69:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:38 [warn] 45#45: *1892161 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.254:8106)', context: ngx.timer, client: 45.177.253.231, server: 0.0.0.0:8080 2022/09/16 11:35:38 [error] 45#45: *1888945 upstream timed out (110: Operation timed out) while sending to client, client: 181.176.116.242, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 45#45: *1888945 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 181.176.116.242, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 45#45: *1893023 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (3/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 87.201.177.125, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [error] 44#44: *1892211 upstream timed out (110: Operation timed out) while sending to client, client: 200.68.170.52, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 44#44: *1892211 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 200.68.170.52, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [error] 44#44: *1868489 upstream timed out (110: Operation timed out) while sending to client, client: 200.192.102.128, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 44#44: *1868489 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 200.192.102.128, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [error] 44#44: *1892267 upstream timed out (110: Operation timed out) while sending to client, client: 36.71.142.253, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 44#44: *1892267 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 36.71.142.253, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [error] 44#44: *1892418 upstream timed out (110: Operation timed out) while sending to client, client: 103.183.35.18, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 44#44: *1892418 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (2/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 103.183.35.18, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [error] 44#44: *1892467 upstream timed out (110: Operation timed out) while sending to client, client: 106.51.2.239, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:38 [warn] 44#44: *1892467 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.188:8106)' while logging request, client: 106.51.2.239, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.188:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:39 [error] 45#45: *1764312 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:39 [error] 45#45: *1715338 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:39 [error] 45#45: *1794012 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:39 [error] 45#45: *1764646 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:39 [warn] 45#45: *1721285 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:35:39 [error] 46#46: *1765792 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 4s, context: ngx.timer 2022/09/16 11:35:39 [error] 44#44: *1892860 upstream timed out (110: Operation timed out) while sending to client, client: 102.78.21.173, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.184:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:39 [warn] 44#44: *1892860 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.184:8106)' while logging request, client: 102.78.21.173, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.184:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:39 [error] 45#45: *1834318 upstream timed out (110: Operation timed out) while sending to client, client: 49.35.250.193, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.141:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:39 [warn] 45#45: *1834318 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.141:8106)' while logging request, client: 49.35.250.193, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.141:8106//xxx", host: "xxx.test.com" 2022/09/16 11:35:40 [error] 46#46: *1766229 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 8s, context: ngx.timer 2022/09/16 11:35:40 [error] 45#45: *1721285 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer 2022/09/16 11:35:40 [error] 46#46: *1765794 [lua] config_etcd.lua:568: no healthy etcd endpoint available, next retry after 2s, context: ngx.timer ...skip many logs... 2022/09/16 11:36:34 [error] 45#45: *1917917 upstream timed out (110: Operation timed out) while sending to client, client: 102.51.1.98, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.150:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:34 [warn] 45#45: *1917917 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.150:8106)' while logging request, client: 102.51.1.98, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.150:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:36 [warn] 44#44: *1917486 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.150:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:36 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (1/2) for 'xxx.test.com(192.168.3.150:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.141:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.69:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.153:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 44#44: *1919283 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:39 [warn] 45#45: *1887395 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.69:8106)' while logging request, client: 187.25.53.91, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.69:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:39 [warn] 44#44: *1903779 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) healthy SUCCESS increment (2/3) for 'xxx.test.com(192.168.3.94:8106)' while logging request, client: 200.68.170.67, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:39 [error] 45#45: *1913636 upstream timed out (110: Operation timed out) while sending to client, client: 37.236.60.18, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:39 [warn] 45#45: *1913636 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while logging request, client: 37.236.60.18, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:43 [warn] 45#45: *1803785 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer 2022/09/16 11:36:43 [warn] 45#45: *1715338 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:36:43 [warn] 45#45: *1721285 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:36:43 [warn] 45#45: *1794012 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.235:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.141:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.102:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.5:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:44 [warn] 44#44: *1921133 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy TIMEOUT increment (1/3) for 'xxx.test.com(192.168.3.94:8106)', context: ngx.timer, client: 105.99.84.230, server: 0.0.0.0:8080 2022/09/16 11:36:44 [error] 44#44: *1892860 upstream timed out (110: Operation timed out) while sending to client, client: 102.78.21.173, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:44 [warn] 44#44: *1892860 [lua] healthcheck.lua:1107: log(): [healthcheck] (upstream#/apisix-edge-gateway/upstreams/xxx-extractor) unhealthy HTTP increment (1/3) for 'xxx.test.com(192.168.3.94:8106)' while logging request, client: 102.78.21.173, server: _, request: "POST /xxx HTTP/1.1", upstream: "http://192.168.3.94:8106//xxx", host: "xxx.test.com" 2022/09/16 11:36:47 [warn] 46#46: *1765794 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:36:50 [warn] 45#45: *1764312 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:36:58 [warn] 45#45: *1764646 [lua] config_etcd.lua:563: reconnected to etcd, context: ngx.timer 2022/09/16 11:37:05 [warn] 44#44: *1784320 [lua] v3.lua:692: request_chunk(): http://apisix-etcd.test.svc.cluster.local:2379: timeout. Retrying, context: ngx.timer ``` ### Environment the kubernetes deployment manifest I used: ```yaml --- apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: apisix-edge namespace: test labels: helm.sh/chart: apisix-0.11.0 app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge app.kubernetes.io/version: "2.15" app.kubernetes.io/managed-by: Helm spec: minAvailable: 90% selector: matchLabels: app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge --- apiVersion: v1 kind: ConfigMap metadata: name: apisix-edge namespace: test data: config.yaml: |- apisix: node_listen: 8080 # APISIX listening port nginx_config: # config for render the template to genarate nginx.conf error_log: "/dev/stderr" error_log_level: "warn" # warn,error worker_processes: "auto" enable_cpu_affinity: true worker_rlimit_nofile: 300000 # the number of files a worker process can open, should be larger than worker_connections event: worker_connections: 63000 worker_shutdown_timeout: 240s # timeout for a graceful shutdown of worker processes max_pending_timers: 16384 # increase it if you see "too many pending timers" error max_running_timers: 4096 # increase it if you see "lua_max_running_timers are not enough" error http: enable_access_log: false keepalive_timeout: 60s # timeout during which a keep-alive client connection will stay open on the server side. client_header_timeout: 60s # timeout for reading client request header, then 408 (Request Time-out) error is returned to the client client_body_timeout: 60s # timeout for reading client request body, then 408 (Request Time-out) error is returned to the client send_timeout: 30s # timeout for transmitting a response to the client. then the connection is closed underscores_in_headers: "on" # default enables the use of underscores in client request header fields real_ip_header: "X-Real-IP" # http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_header real_ip_from: # http://nginx.org/en/docs/http/ngx_http_realip_module.html#set_real_ip_from - 127.0.0.1 - 'unix:' etcd: host: # it's possible to define multiple etcd hosts addresses of the same etcd cluster. - "http://apisix-etcd.test.svc.cluster.local:2379" # multiple etcd address prefix: "/apisix-edge-gateway" # apisix configurations prefix timeout: 30 # 30 seconds user: "root" password: "xxx" plugins: # plugin list - kafka-logger - prometheus - proxy-rewrite - proxy-mirror - gzip - real-ip stream_plugins: - limit-conn plugin_attr: prometheus: export_addr: ip: 0.0.0.0 port: 9091 export_uri: /apisix/prometheus/metrics metric_prefix: apisix_ # 下调 proxy-mirror 的超时时间,避免请求等待镜像流量的响应,造成大量请求超时! proxy-mirror: timeout: # proxy timeout in mirrored sub-request connect: 200ms read: 200ms send: 200ms --- apiVersion: v1 kind: Service metadata: name: apisix-edge-gateway namespace: test labels: helm.sh/chart: apisix-0.11.0 app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge app.kubernetes.io/version: "2.15" app.kubernetes.io/managed-by: Helm app.kubernetes.io/service: apisix-gateway spec: type: NodePort externalTrafficPolicy: Cluster ports: - name: apisix-gateway port: 80 targetPort: 8080 protocol: TCP selector: app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge --- apiVersion: apps/v1 kind: Deployment metadata: name: apisix-edge namespace: test labels: helm.sh/chart: apisix-0.11.0 app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge app.kubernetes.io/version: "2.15" app.kubernetes.io/managed-by: Helm spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge template: metadata: annotations: prometheus.io/path: /apisix/prometheus/metrics prometheus.io/port: "9091" prometheus.io/scrape: "true" labels: app.kubernetes.io/name: apisix app.kubernetes.io/instance: apisix-edge spec: terminationGracePeriodSeconds: 360 securityContext: null containers: - name: apisix securityContext: null image: "apache/apisix:2.15-alpine" imagePullPolicy: IfNotPresent ports: - name: http containerPort: 8080 protocol: TCP - name: tls containerPort: 9443 protocol: TCP - name: admin containerPort: 9180 protocol: TCP - name: prometheus containerPort: 9091 protocol: TCP readinessProbe: failureThreshold: 6 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 tcpSocket: port: 8080 timeoutSeconds: 1 lifecycle: preStop: exec: command: - /bin/sh - -c - "sleep 300" volumeMounts: - mountPath: /usr/local/apisix/conf/config.yaml name: apisix-config subPath: config.yaml resources: limits: cpu: 1900m memory: 3000Mi requests: cpu: 1600m memory: 2600Mi volumes: - configMap: name: apisix-edge name: apisix-config nodeSelector: eks.amazonaws.com/nodegroup: test-edge-gateway affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/arch operator: In values: - amd64 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - apisix topologyKey: kubernetes.io/hostname weight: 100 tolerations: - effect: NoSchedule key: platform.xxx/test-edge-gateway operator: Exists ``` the APISIX upstream config: ```json { "id": "xxx-extractor", "desc": "xxx-extractor", "scheme": "http", "pass_host": "pass", "type":"roundrobin", "keepalive_pool": { "size": 1000, "idle_timeout": 75, "requests": 1000 }, "nodes": [ { "host": "192.168.3.235", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.233", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.141", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.129", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.7", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.102", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.69", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.5", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.153", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.184", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.158", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.94", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.254", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.188", "port": 8080, "weight": 0, "priority": 0}, { "host": "192.168.3.150", "port": 8080, "weight": 0, "priority": 0} ], "retries": 0, "timeout": { "connect": 3, "send": 3, "read": 3 }, "checks": { "active": { "timeout": 3, "http_path": "/", "healthy": { "interval": 5, "successes": 2 }, "unhealthy": { "interval": 5, "http_failures": 3 }, "req_headers": ["User-Agent: curl/7.29.0"] }, "passive": { "healthy": { "http_statuses": [200, 201], "successes": 3 }, "unhealthy": { "http_statuses": [500, 502, 503, 504], "http_failures": 3, "tcp_failures": 3 } } } } ``` and the APISIX route configs: ```json { "id": "xxx-ingress", "uri": "/*", "hosts": [ "xxx.test.com" ], "methods": [ "PUT", "GET", "POST", "HEAD" ], "plugins": { "prometheus": {} }, "upstream_id": "xxx-extractor" } ``` running in Kubernetes, using helm chart `apisix/apisix` with version `0.11.0`. - APISIX version (run `apisix version`): `docker.io/apache/apisix:2.15.0-alpine` - Operating system (run `uname -a`): `Linux apisix-edge-765f88c49f-8tslz 5.4.190-107.353.amzn2.x86_64 #1 SMP Wed Apr 27 21:16:35 UTC 2022 x86_64 Linux` - etcd version: `docker.io/bitnami/etcd:3.5.4-debian-11-r22` - grafana dashbaord I use: <https://grafana.com/grafana/dashboards/11719-apache-apisix/> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
