wklken commented on issue #10093: URL: https://github.com/apache/apisix/issues/10093#issuecomment-1724748805
@jiangfucheng Finally, It been reproduced on the apisix docker-compose. It take a lot of time , add I add some scripts to help. here is the steps, can you please help to investigate this? (this is the problem which stuck our version for production) ------ ## Reproduce steps: ### 1. use the docker-compose 1. `git clone https://github.com/apache/apisix-docker.git` 2. `vim apisix_conf/master/config.yaml`, change the `router` and `nginx_config.processes` ```yaml apisix: node_listen: 9080 # APISIX listening port enable_ipv6: false router: http: radixtree_uri_with_parameter nginx_config: worker_processes: 4 ``` 3. `vim docker-compose-master.yaml` change the `image` to 3.2.1-centos ```yaml services: apisix: image: "apache/apisix:3.2.1-centos" ``` 4. `docker-compose -p docker-apisix -f docker-compose-master.yaml up` start the apisix and etcd ### 2. enter the container, do some change to mock the dns failure > all in the container 1. `docker exec -it dockerapisix_apisix_1 /bin/bash` 2. check the `conf/config.yaml` is right, 4 processes, and the `apisix version` is 3.2.1 3. `vi apisix/utils/upstream.lua`, add code to mock the dns failure, 50% will `101 empty records` ```lua -- to line 22 local random = math.random -- add before core.resolver.parse_domain(host) local function parse_domain_for_nodes(nodes) local new_nodes = core.table.new(#nodes, 0) for _, node in ipairs(nodes) do local host = node.host if not ipmatcher.parse_ipv4(host) and not ipmatcher.parse_ipv6(host) then if host == "httpbin.org" and ngx_now() <= 1695089345 then if random(1,10) % 2 == 0 then core.log.error("101 empty records") return new_nodes end end local ip, err = core.resolver.parse_domain(host) ``` 4. add a script `create.sh` to register the route and service, **check the API_KEY is right** ```bash #!/bin/bash API_KEY="edd1c9f034335f136f87ad84b625c8f1" ROUTE_ID="dns_route" SERVVICE_ID="dns_service" curl http://127.0.0.1:9180/apisix/admin/services/${SERVVICE_ID} -H "X-API-KEY: ${API_KEY}" -X PUT -d ' { "upstream": { "nodes": [ { "host": "httpbin.org", "port": 80, "weight": 100, "priority": 1 } ], "type": "roundrobin", "scheme": "http", "pass_host": "node" } }' curl http://127.0.0.1:9180/apisix/admin/routes/${ROUTE_ID} -H "X-API-KEY: ${API_KEY}" -X PUT -d ' { "uri": "/api/test/prod/dns22", "methods": [ "GET" ], "plugins": { "proxy-rewrite": { "method": "GET", "uri": "/get" } }, "upstream": { "nodes": [ { "host": "httpbin.org", "port": 80, "weight": 100, "priority": 1 } ], "type": "roundrobin", "scheme": "http", "pass_host": "node" }, "service_id": "dns_service", "status": 1 }' ``` 5. add script `update_lua.sh` to change the timestamp fast ```bash #!/bin/bash now=$(date "+%s") echo "now is: ${now}" A=$(date -d "+30 seconds" "+%s") echo "will change the condition to <= ${A}" sed -i -r "s/<= ([0-9]+) then/<= ${A} then/g" apisix/utils/upstream.lua echo $? echo "change done" apisix reload ``` 6. do the register ```bash bash -x create.sh ``` 7. check the url is ok **not in the container** ``` curl -vv http://0.0.0.0:9080/api/test/prod/dns22 ``` ## 3. not in the container, add check script > not in the container 1. install wrk 2. add script `start_and_check.sh` for benchmark then check 503 ```bash #!/bin/bash date url="http://0.0.0.0:9080/api/test/prod/dns22" echo "start bench wrk" wrk -c2 -t2 -d35s ${url} date echo "sleep 5 s" sleep 5 echo "check the status code 10 times" for ((i=1; i<=10; i++)) do status_code=$(curl --write-out %{http_code} --silent --output /dev/null $url) echo "status=$status_code" if [ "${status_code}" -eq "503" ] then echo "503 show" exit fi done ``` ## 4. reproduce > open two window to run step 1/2 at the same time 1. in container, run ``` $ ./update_lua.sh now is: 1695089267 will change the condition to <= 1695089297 0 change done /usr/local/openresty//luajit/bin/luajit ./apisix/cli/apisix.lua reload WARNING: using fixed Admin API token has security risk. Please modify "admin_key" in conf/config.yaml . 2023/09/19 02:07:47 [notice] 157#157: signal process started ``` 2. out the container, run ``` $ bash start_and_check.sh Tue Sep 19 10:08:36 CST 2023 start bench wrk Running 35s test @ http://0.0.0.0:9080/api/test/prod/dns22 2 threads and 2 connections Thread Stats Avg Stdev Max +/- Stdev Latency 15.22ms 80.61ms 823.04ms 95.99% Req/Sec 6.91k 807.96 7.77k 95.97% 461582 requests in 35.00s, 202.93MB read Non-2xx or 3xx responses: 461575 Requests/sec: 13187.64 Transfer/sec: 5.80MB Tue Sep 19 10:09:11 CST 2023 sleep 5 s check the status code 10 times status=503 503 show ``` 3. if `503 show`, means it been reproduced! you can `curl -vv http://0.0.0.0:9080/api/test/prod/dns22` for few times to check ``` curl -vv http://0.0.0.0:9080/api/test/prod/dns22 * About to connect() to 0.0.0.0 port 9080 (#0) * Trying 0.0.0.0... * Connected to 0.0.0.0 (0.0.0.0) port 9080 (#0) > GET /api/test/prod/dns22 HTTP/1.1 > User-Agent: curl/7.29.0 > Host: 0.0.0.0:9080 > Accept: */* > < HTTP/1.1 503 Service Temporarily Unavailable < Date: Tue, 19 Sep 2023 02:21:58 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 269 < Connection: keep-alive < Server: APISIX/3.2.1 < <html> <head><title>503 Service Temporarily Unavailable</title></head> <body> <center><h1>503 Service Temporarily Unavailable</h1></center> <hr><center>openresty</center> <p><em>Powered by <a href="https://apisix.apache.org/">APISIX</a>.</em></p></body> </html> * Connection #0 to host 0.0.0.0 left intact ``` 5. if all `200`, do step 1/2 again(it took 50 seconds each time, and it not 100% show 503, but if you try about 10 times, high possibility showed). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@apisix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org