nanamikon opened a new issue, #10431: URL: https://github.com/apache/apisix/issues/10431
### Current Behavior  One of worker run into a dead end ### Expected Behavior Not run into a dead end ### Error Logs Lua Thread stack as follow ``` (gdb) lbt C:ngx_http_lua_ngx_sleep builtin#21 @/scripts//deps/share/lua/5.1/resty/worker/events.lua:353 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:1156 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:621 builtin#21 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:505 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:527 @/scripts/apisix/balancer.lua:235 @/scripts/apisix/balancer.lua:363 @/scripts/apisix/init.lua:830 =balancer_by_lua:2 (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x00007f08bd849000 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1 (gdb) lbt C:ngx_http_lua_ngx_sleep builtin#21 @/scripts//deps/share/lua/5.1/resty/worker/events.lua:353 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:1156 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:621 builtin#21 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:505 @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:527 @/scripts/apisix/balancer.lua:235 @/scripts/apisix/balancer.lua:363 @/scripts/apisix/init.lua:830 ``` Thread stack as follow ``` (gdb) c Continuing. Breakpoint 1, lj_str_new (L=0x405b6380, str=0x47b72010 "API disabled in the context of balancer_by_lua*-gz/upstreams/740001-cbca25b9-2f64-46b0-80f5-62f5cb408ea1:state:10.112.230.60:8080808080r4LYLNew!10.124.117.66direction=OUTdirection=IN475&clusters=&dire"..., lenx=47) at lj_str.c:351 351 in lj_str.c (gdb) bt #0 lj_str_new (L=0x405b6380, str=0x47b72010 "API disabled in the context of balancer_by_lua*-gz/upstreams/740001-cbca25b9-2f64-46b0-80f5-62f5cb408ea1:state:10.112.230.60:8080808080r4LYLNew!10.124.117.66direction=OUTdirection=IN475&clusters=&dire"..., lenx=47) at lj_str.c:351 #1 0x00007f08bfa2305a in lj_buf_str (sb=0x405b6460, L=0x405b6380) at lj_buf.h:195 #2 lj_strfmt_pushvf (L=L@entry=0x405b6380, fmt=<optimized out>, argp=argp@entry=0x7fff7f224028) at lj_strfmt.c:590 #3 0x00007f08bfa1966e in luaL_error (L=0x405b6380, fmt=<optimized out>) at lj_err.c:1097 #4 0x00007f08bfa14d73 in lj_BC_FUNCC () from /usr/local/openresty/luajit/lib/libluajit-5.1.so.2 #5 0x00007f08bfa281ed in lua_pcall (L=L@entry=0x405b6380, nargs=nargs@entry=0, nresults=nresults@entry=1, errfunc=errfunc@entry=1) at lj_api.c:1145 #6 0x000000000052c413 in ngx_http_lua_balancer_by_chunk (L=0x405b6380, r=0x1e12400) at ../ngx_lua-0.10.21/src/ngx_http_lua_balancer.c:189 #7 0x000000000052cbe9 in ngx_http_lua_balancer_get_peer (pc=0x1cc2388, data=0x1d8a528) at ../ngx_lua-0.10.21/src/ngx_http_lua_balancer.c:456 #8 0x0000000000450c68 in ngx_event_connect_peer (pc=pc@entry=0x1cc2388) at src/event/ngx_event_connect.c:34 #9 0x000000000048581b in ngx_http_upstream_connect (r=0x1e12400, u=0x1cc2378) at src/http/ngx_http_upstream.c:1559 #10 0x0000000000481c58 in ngx_http_upstream_handler (ev=<optimized out>) at src/http/ngx_http_upstream.c:1310 #11 0x00000000004589db in ngx_epoll_process_events (cycle=<optimized out>, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901 #12 0x000000000044f213 in ngx_process_events_and_timers (cycle=cycle@entry=0x16c6fb0) at src/event/ngx_event.c:257 #13 0x0000000000456bb2 in ngx_worker_process_cycle (cycle=0x16c6fb0, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:806 #14 0x000000000045543c in ngx_spawn_process (cycle=cycle@entry=0x16c6fb0, proc=proc@entry=0x456b40 <ngx_worker_process_cycle>, data=data@entry=0x3, name=name@entry=0x5df3cd "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199 #15 0x00000000004570ac in ngx_start_worker_processes (cycle=cycle@entry=0x16c6fb0, n=8, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:392 #16 0x00000000004578d4 in ngx_master_process_cycle (cycle=cycle@entry=0x16c6fb0) at src/os/unix/ngx_process_cycle.c:138 #17 0x000000000042e419 in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:386 ``` ### Steps to Reproduce Refer to lib/resty/worker/events.lua (tag 1.0.0) https://github.com/Kong/lua-resty-worker-events/blob/1.0.0/lib/resty/worker/events.lua ``` lua local count = 0 local cache_data = {} local cache_err = {} -- in case an event id has been published, but we're fetching it before -- its data was posted and we have to wait, we don't want the next -- event to timeout before we get to it, so go and cache what's -- available, to minimize lost data while _last_event < event_id do count = count + 1 _last_event = _last_event + 1 --debug("fetching event", _last_event) cache_data[count], cache_err[count] = get_event_data(_last_event) end local expire = now() + _wait_max for idx = 1, count do local data = cache_data[idx] local err = cache_err[idx] while not data do if err then log(ERR, "worker-events: error fetching event data: ", err) break else -- just nil, so must wait for data to appear if now() >= expire then break end -- wait and retry -- if the `sleep` function is unavailable in the current openresty -- 'context' (eg. 'init_worker'), then the pcall fails. We're not -- checking the result, but will effectively be doing a busy-wait -- by looping until it hits the time-out, or the data is retrieved _busy_polling = true -- need to flag because `sleep` will yield control -- and another coroutine might re-enter pcall(sleep, _wait_interval) _busy_polling = nil data, err = get_event_data(_last_event - count + idx) end end ``` If no data found in the cache_data, it will pcall ngx sleep int th balancer_by_lua contenxt, which is not allow here. When pcall fail at once, now() will not update, and the worker run into a dead end I found lua-resty-worker-events [fixed this bug](https://github.com/Kong/lua-resty-worker-events/pull/43/commits/ac1e44a967ab3ffd3adc88c6ee2047010866ce27#diff-267831f003023d197ea175100f4f1caedfd9b7c960eb272d67d1c56ec3a7d41d), but the version of worker-events in the 2.15.3 and the lastest version of apisix is 1.0.0, so it is time to update the version of worker-events ? ### Environment - APISIX version (run `apisix version`): 2.15.3 - Operating system (run `uname -a`): ubuntu 16 - OpenResty / Nginx version (run `openresty -V` or `nginx -V`): - etcd version, if relevant (run `curl http://127.0.0.1:9090/v1/server_info`): - APISIX Dashboard version, if relevant: - Plugin runner version, for issues related to plugin runners: - LuaRocks version, for installation issues (run `luarocks --version`): -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
