nanamikon opened a new issue, #10431:
URL: https://github.com/apache/apisix/issues/10431

   ### Current Behavior
   
   
![image](https://github.com/apache/apisix/assets/2010632/c874911e-1d96-4265-bfec-8610cade13b5)
   One of worker run into a dead end
   
   
   
   
   ### Expected Behavior
   
   Not run into a dead end
   
   ### Error Logs
   
   
   Lua Thread stack as follow
   ```
   (gdb) lbt
   C:ngx_http_lua_ngx_sleep
   builtin#21
   @/scripts//deps/share/lua/5.1/resty/worker/events.lua:353
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:1156
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:621
   builtin#21
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:505
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:527
   @/scripts/apisix/balancer.lua:235
   @/scripts/apisix/balancer.lua:363
   @/scripts/apisix/init.lua:830
   =balancer_by_lua:2
   (gdb) c
   Continuing.
   ^C
   Program received signal SIGINT, Interrupt.
   0x00007f08bd849000 in _Unwind_Find_FDE () from 
/lib/x86_64-linux-gnu/libgcc_s.so.1
   (gdb) lbt
   C:ngx_http_lua_ngx_sleep
   builtin#21
   @/scripts//deps/share/lua/5.1/resty/worker/events.lua:353
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:1156
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:621
   builtin#21
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:505
   @/scripts//deps/share/lua/5.1/resty/healthcheck.lua:527
   @/scripts/apisix/balancer.lua:235
   @/scripts/apisix/balancer.lua:363
   @/scripts/apisix/init.lua:830
   
   ```
   
   Thread stack as follow
   
   ```
   (gdb) c
   Continuing.
   
   Breakpoint 1, lj_str_new (L=0x405b6380, 
       str=0x47b72010 "API disabled in the context of 
balancer_by_lua*-gz/upstreams/740001-cbca25b9-2f64-46b0-80f5-62f5cb408ea1:state:10.112.230.60:8080808080r4LYLNew!10.124.117.66direction=OUTdirection=IN475&clusters=&dire"...,
 lenx=47) at lj_str.c:351
   351  in lj_str.c
   (gdb) bt
   #0  lj_str_new (L=0x405b6380, 
       str=0x47b72010 "API disabled in the context of 
balancer_by_lua*-gz/upstreams/740001-cbca25b9-2f64-46b0-80f5-62f5cb408ea1:state:10.112.230.60:8080808080r4LYLNew!10.124.117.66direction=OUTdirection=IN475&clusters=&dire"...,
 lenx=47) at lj_str.c:351
   #1  0x00007f08bfa2305a in lj_buf_str (sb=0x405b6460, L=0x405b6380) at 
lj_buf.h:195
   #2  lj_strfmt_pushvf (L=L@entry=0x405b6380, fmt=<optimized out>, 
argp=argp@entry=0x7fff7f224028) at lj_strfmt.c:590
   #3  0x00007f08bfa1966e in luaL_error (L=0x405b6380, fmt=<optimized out>) at 
lj_err.c:1097
   #4  0x00007f08bfa14d73 in lj_BC_FUNCC () from 
/usr/local/openresty/luajit/lib/libluajit-5.1.so.2
   #5  0x00007f08bfa281ed in lua_pcall (L=L@entry=0x405b6380, 
nargs=nargs@entry=0, nresults=nresults@entry=1, errfunc=errfunc@entry=1) at 
lj_api.c:1145
   #6  0x000000000052c413 in ngx_http_lua_balancer_by_chunk (L=0x405b6380, 
r=0x1e12400) at ../ngx_lua-0.10.21/src/ngx_http_lua_balancer.c:189
   #7  0x000000000052cbe9 in ngx_http_lua_balancer_get_peer (pc=0x1cc2388, 
data=0x1d8a528) at ../ngx_lua-0.10.21/src/ngx_http_lua_balancer.c:456
   #8  0x0000000000450c68 in ngx_event_connect_peer (pc=pc@entry=0x1cc2388) at 
src/event/ngx_event_connect.c:34
   #9  0x000000000048581b in ngx_http_upstream_connect (r=0x1e12400, 
u=0x1cc2378) at src/http/ngx_http_upstream.c:1559
   #10 0x0000000000481c58 in ngx_http_upstream_handler (ev=<optimized out>) at 
src/http/ngx_http_upstream.c:1310
   #11 0x00000000004589db in ngx_epoll_process_events (cycle=<optimized out>, 
timer=<optimized out>, flags=<optimized out>)
       at src/event/modules/ngx_epoll_module.c:901
   #12 0x000000000044f213 in ngx_process_events_and_timers 
(cycle=cycle@entry=0x16c6fb0) at src/event/ngx_event.c:257
   #13 0x0000000000456bb2 in ngx_worker_process_cycle (cycle=0x16c6fb0, 
data=<optimized out>) at src/os/unix/ngx_process_cycle.c:806
   #14 0x000000000045543c in ngx_spawn_process (cycle=cycle@entry=0x16c6fb0, 
proc=proc@entry=0x456b40 <ngx_worker_process_cycle>, data=data@entry=0x3, 
       name=name@entry=0x5df3cd "worker process", respawn=respawn@entry=-3) at 
src/os/unix/ngx_process.c:199
   #15 0x00000000004570ac in ngx_start_worker_processes 
(cycle=cycle@entry=0x16c6fb0, n=8, type=type@entry=-3) at 
src/os/unix/ngx_process_cycle.c:392
   #16 0x00000000004578d4 in ngx_master_process_cycle 
(cycle=cycle@entry=0x16c6fb0) at src/os/unix/ngx_process_cycle.c:138
   #17 0x000000000042e419 in main (argc=<optimized out>, argv=<optimized out>) 
at src/core/nginx.c:386
   ```
   
   
   ### Steps to Reproduce
   
   Refer to lib/resty/worker/events.lua (tag 1.0.0)
   
https://github.com/Kong/lua-resty-worker-events/blob/1.0.0/lib/resty/worker/events.lua
   
   ``` lua
   local count = 0
     local cache_data = {}
     local cache_err = {}
     -- in case an event id has been published, but we're fetching it before
     -- its data was posted and we have to wait, we don't want the next
     -- event to timeout before we get to it, so go and cache what's
     -- available, to minimize lost data
     while _last_event < event_id do
       count = count + 1
       _last_event = _last_event + 1
       --debug("fetching event", _last_event)
       cache_data[count], cache_err[count] = get_event_data(_last_event)
     end
     local expire = now() + _wait_max
     for idx = 1, count do
       local data = cache_data[idx]
       local err = cache_err[idx]
       while not data do
         if err then
           log(ERR, "worker-events: error fetching event data: ", err)
           break
         else
           -- just nil, so must wait for data to appear
           if now() >= expire then
             break
           end
           -- wait and retry
           -- if the `sleep` function is unavailable in the current openresty
           -- 'context' (eg. 'init_worker'), then the pcall fails. We're not
           -- checking the result, but will effectively be doing a busy-wait
           -- by looping until it hits the time-out, or the data is retrieved
           _busy_polling = true  -- need to flag because `sleep` will yield 
control
                                 -- and another coroutine might re-enter
           pcall(sleep, _wait_interval)
           _busy_polling = nil
           data, err = get_event_data(_last_event - count + idx)
         end
       end
   ```
   
   If no data found in the cache_data,   it will pcall ngx sleep int th 
balancer_by_lua contenxt,   which is not allow here.
   When pcall fail at once,   now() will not update,   and the worker run into 
a dead end
   
   I found lua-resty-worker-events [fixed this 
bug](https://github.com/Kong/lua-resty-worker-events/pull/43/commits/ac1e44a967ab3ffd3adc88c6ee2047010866ce27#diff-267831f003023d197ea175100f4f1caedfd9b7c960eb272d67d1c56ec3a7d41d),
   but the version of worker-events in the 2.15.3 and the lastest version of 
apisix is 1.0.0,  so it is time to update the version of worker-events ?
   
   
   
   ### Environment
   
   - APISIX version (run `apisix version`):   2.15.3
   - Operating system (run `uname -a`): ubuntu 16
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`):
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`):
   - APISIX Dashboard version, if relevant:
   - Plugin runner version, for issues related to plugin runners:
   - LuaRocks version, for installation issues (run `luarocks --version`):
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to