On Mon, Aug 27, 2018 at 02:29:42PM +0200, Frederic Lecaille wrote:
> On 08/27/2018 01:33 PM, Olivier Houchard wrote:
> > Hi Pieter,
> > 
> > On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote:
> > > Hi List, Thierry, Olivier,
> > > 
> > > Using a lua-socket with connect_ssl and haproxy running with nbthread 3..
> > > results in haproxy hanging with 3 threads for me.
> > > 
> > > This while using both 1.9-7/30 version (with the 2 extra patches from
> > > Olivier avoiding 100% on a single thread.) and also a build of today's
> > > snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> > > 
> > > Below info is at the bottom of the mail:
> > > - haproxy -vv
> > > - gdb backtraces
> > > 
> > > This one is easy to reproduce after just a few calls to the lua function
> > > with the lua code i'm writing on a test-box.. So if a 'simple' config that
> > > makes a reproduction is desired i can likely come up with one.
> > > Same lua code with nbthread 1 seems to work properly.
> > > 
> > > Is below info (the stack traces) enough to come up with a fix? If not 
> > > lemme
> > > know and ill try and make a small reproduction of it.
> > > 
> > > 
> > > root@freebsd11:~ # haproxy -vv
> > > HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
> > > Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>
> > > 
> > > Build options :
> > >    TARGET  = freebsd
> > >    CPU     = generic
> > >    CC      = cc
> > >    CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector
> > > -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement
> > > -fwrapv -fno-strict-overflow -Wno-address-of-packed-member
> > > -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
> > >    OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1
> > > USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1
> > > 
> > > Default settings :
> > >    maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
> > > 
> > > Built with network namespace support.
> > > Built with zlib version : 1.2.11
> > > Running on zlib version : 1.2.11
> > > Compression algorithms supported : identity("identity"), 
> > > deflate("deflate"),
> > > raw-deflate("deflate"), gzip("gzip")
> > > Built with PCRE version : 8.40 2017-01-11
> > > Running on PCRE version : 8.40 2017-01-11
> > > PCRE library supports JIT : yes
> > > Built with multi-threading support.
> > > Encrypted password support via crypt(3): yes
> > > Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
> > > Built with Lua version : Lua 5.3.4
> > > Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> > > Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
> > > OpenSSL library supports TLS extensions : yes
> > > OpenSSL library supports SNI : yes
> > > OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
> > > 
> > > Available polling systems :
> > >       kqueue : pref=300,  test result OK
> > >         poll : pref=200,  test result OK
> > >       select : pref=150,  test result OK
> > > Total: 3 (3 usable), will use kqueue.
> > > 
> > > Available multiplexer protocols :
> > > (protocols markes as <default> cannot be specified using 'proto' keyword)
> > >         <default> : mode=TCP|HTTP   side=FE|BE
> > >                h2 : mode=HTTP       side=FE
> > > 
> > > Available filters :
> > >          [TRACE] trace
> > >          [COMP] compression
> > >          [SPOE] spoe
> > > 
> > > root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
> > > GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
> > > Copyright (C) 2018 Free Software Foundation, Inc.
> > > License GPLv3+: GNU GPL version 3 or later
> > > <http://gnu.org/licenses/gpl.html>
> > > This is free software: you are free to change and redistribute it.
> > > There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> > > and "show warranty" for details.
> > > This GDB was configured as "x86_64-portbld-freebsd11.1".
> > > Type "show configuration" for configuration details.
> > > For bug reporting instructions, please see:
> > > <http://www.gnu.org/software/gdb/bugs/>.
> > > Find the GDB manual and other documentation resources online at:
> > > <http://www.gnu.org/software/gdb/documentation/>.
> > > For help, type "help".
> > > Type "apropos word" to search for commands related to "word".
> > > Attaching to process 39649
> > > Reading symbols from /usr/local/sbin/haproxy...done.
> > > [New LWP 101651 of process 39649]
> > > [New LWP 101652 of process 39649]
> > > Reading symbols from /lib/libcrypt.so.5...(no debugging symbols
> > > found)...done.
> > > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
> > > Reading symbols from /lib/libthr.so.3...(no debugging symbols 
> > > found)...done.
> > > Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols
> > > found)...done.
> > > Reading symbols from /lib/libcrypto.so.8...(no debugging symbols
> > > found)...done.
> > > Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols
> > > found)...done.
> > > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
> > > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
> > > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
> > > found)...done.
> > > [Switching to LWP 101650 of process 39649]
> > > 0x0000000801e11e3a in _kevent () from /lib/libc.so.7
> > > (gdb) info thread
> > >    Id   Target Id         Frame
> > > * 1    LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from
> > > /lib/libc.so.7
> > >    2    LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock
> > > (lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781
> > > "hlua_ctx_resume",
> > >      file=0x62a328 "src/hlua.c", line=1070) at 
> > > include/common/hathreads.h:731
> > >    3    LWP 101652 of process 39649 0x000000080187a70c in ?? () from
> > > /usr/local/lib/liblua-5.3.so
> > > (gdb) bt full
> > > #0  0x0000000801e11e3a in _kevent () from /lib/libc.so.7
> > > No symbol table info available.
> > > #1  0x0000000800f15ca2 in ?? () from /lib/libthr.so.3
> > > No symbol table info available.
> > > #2  0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, 
> > > exp=1923638009)
> > > at src/ev_kqueue.c:151
> > >          status = 37937304
> > >          count = 1923638009
> > >          fd = 200
> > >          delta_ms = 1000
> > >          timeout = {tv_sec = 1, tv_nsec = 0}
> > >          updt_idx = 0
> > >          changes = 0
> > >          old_fd = -1
> > > #3  0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419
> > >          next = 1923638009
> > >          exp = 1923638009
> > > #4  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at
> > > src/haproxy.c:2451
> > >          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> > > last_location = {function = 0x0, file = 0x0, line = 0}}}
> > >          ptif = 0x8c1980 <per_thread_init_list>
> > >          ptdf = 0x800f1841e <pthread_sigmask+46>
> > > #5  0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at
> > > src/haproxy.c:3053
> > >          tids = 0x8024849f0
> > >          threads = 0x802510ac0
> > >          i = 3
> > >          old_sig = {__bits = {1073741824, 0, 0, 0}}
> > >          blocked_sig = {__bits = {4227856759, 4294967295, 4294967295,
> > > 4294967295}}
> > >          err = 0
> > >          retry = 200
> > >          limit = {rlim_cur = 4033, rlim_max = 4033}
> > >          errmsg = 
> > > "\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000
> > >  
> > > \353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000"
> > >          pidfd = -1
> > > (gdb) thread 2
> > > [Switching to thread 2 (LWP 101651 of process 39649)]
> > > #0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
> > > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
> > > "src/hlua.c", line=1070)
> > >      at include/common/hathreads.h:731
> > > 731             __SPIN_LOCK(&l->lock);
> > > (gdb) bt full
> > > #0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
> > > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
> > > "src/hlua.c", line=1070)
> > >      at include/common/hathreads.h:731
> > >          __pl_l = 0x8cf1d8 <hlua_global_lock>
> > >          __pl_r = 4294967300
> > >          start_time = 7797095086642562
> > > #1  0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, 
> > > yield_allowed=1)
> > > at src/hlua.c:1070
> > >          ret = 9240032
> > >          msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370"
> > >          trace = 0x7fffdfffdcb0 ""
> > > #2  0x000000000044236d in hlua_process_task (task=0x80242d960,
> > > context=0x8024f4500, state=513) at src/hlua.c:5641
> > >          hlua = 0x8024f4500
> > >          status = HLUA_E_OK
> > > #3  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
> > >          t = 0x80242d960
> > >          state = 513
> > >          ctx = 0x8024f4500
> > >          process = 0x4422f0 <hlua_process_task>
> > >          t = 0x80242df00
> > >          max_processed = 200
> > > #4  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
> > >          next = 1923609561
> > >          exp = 1923608777
> > > #5  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at
> > > src/haproxy.c:2451
> > >          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> > > last_location = {function = 0x0, file = 0x0, line = 0}}}
> > >          ptif = 0x8c1980 <per_thread_init_list>
> > >          ptdf = 0x800f177cc
> > > #6  0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
> > > No symbol table info available.
> > > #7  0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
> > > (gdb) thread 3
> > > [Switching to thread 3 (LWP 101652 of process 39649)]
> > > #0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
> > > (gdb) bt full
> > > #0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
> > > No symbol table info available.
> > > #1  0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so
> > > No symbol table info available.
> > > #2  0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
> > > No symbol table info available.
> > > #3  0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
> > > No symbol table info available.
> > > #4  0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, 
> > > yield_allowed=1)
> > > at src/hlua.c:1186
> > >          ret = 0
> > >          msg = 0x5a8a24 <b_alloc_margin+324> "\351\200"
> > >          trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177"
> > > #5  0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at
> > > src/hlua.c:6716
> > >          si = 0x802419540
> > >          strm = 0x802419200
> > >          res = 0x802419270
> > >          rule = 0x80242e360
> > >          px = 0x802512c00
> > >          hlua = 0x8024f4f80
> > >          blk1 = 0x7fffdfdfcca0 ""
> > >          len1 = 34397588737
> > >          blk2 = 0x802419278 ""
> > >          len2 = 34397590136
> > >          ret = 0
> > > #6  0x00000000005a78a7 in task_run_applet (t=0x80242efe0,
> > > context=0x8024c5880, state=16385) at src/applet.c:49
> > >          app = 0x8024c5880
> > >          si = 0x802419540
> > > #7  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
> > >          t = 0x80242efe0
> > >          state = 16385
> > >          ctx = 0x8024c5880
> > >          process = 0x5a77f0 <task_run_applet>
> > >          t = 0x80242efe0
> > >          max_processed = 200
> > > #8  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
> > >          next = 1923608730
> > >          exp = 1923608070
> > > #9  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at
> > > src/haproxy.c:2451
> > >          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
> > > last_location = {function = 0x0, file = 0x0, line = 0}}}
> > >          ptif = 0x8c1980 <per_thread_init_list>
> > >          ptdf = 0x800f177cc
> > > ---Type <return> to continue, or q <return> to quit---
> > > #10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
> > > No symbol table info available.
> > > #11 0x0000000000000000 in ?? ()
> > > No symbol table info available.
> > > Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
> > > (gdb)
> > > 
> > > 
> > 
> > I found one minor problem with the LUA code (patch attached) but I kinda
> > doubt this is your problem here.
> 
> Interesting patch, because I have noticed that reg-test/lua/b00001.vtc may
> sometimes fail (with a haproxy 100% CPU usage) in the TCP mode part (when
> using c2 client), both when *thread support is enabled or not.*
> 
> So I think your patch may have fixed the same issue I can try to reproduce.
> 
> Well, I have just tried to reproduce it running a loop of
> reg-test/lua/b00001.vtc script, and your patch does not solve it. Obviously,
> this does not mean your patch is not correct.
> 

Hmm this is interesting, but I'm not sure it's the same problem as Pieter', as
he specifically said it worked fine with only 1 thread.

> > It seems one thread is stuck in lua_gc() while holding the global LUA lock,
> > but I don't know enough about LUA to guess what is going on.
> 
> What is suspect is that the HTTP and TCP applet functions
> hlua_applet_(http|tcp)_fct() are called several times even when the applet
> is done, or when the streams are disconnected or closed:
> 
>  /* If the stream is disconnect or closed, ldo nothing. */
>     if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO))
>         return;
> 
> this leads to call hlua_ctx_resume() several times from the same thread I
> guess.
> 

But if hlua_applet_(http|tcp)_fct() just returns, who calls
hlua_ctx_resume() ? :)

> This may be exposed running reg-tests/lua/b00001.vtc continuously:
> 
>     while HAPROXY_PROGRAM=~/src/haproxy/haproxy
> ~/src/varnish-cache-trunk/bin/varnishtest/varnishtest -t5 -lv
> reg-tests/lua/b00001.vtc; do echo
> --------------------------------------------------------------------; done
> 
> Each test must take about 120ms. So, if varnistest stops running this reg
> test during about 1s you will have to interrupt it to prevent it to kill
> haproxy, and to attach gdb to the haproxy processes.
> 
> 

Very interesting, time for me to setup varnishtest.


Regards,

Olivier

Reply via email to