On Mon, Aug 27, 2018 at 02:29:42PM +0200, Frederic Lecaille wrote: > On 08/27/2018 01:33 PM, Olivier Houchard wrote: > > Hi Pieter, > > > > On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote: > > > Hi List, Thierry, Olivier, > > > > > > Using a lua-socket with connect_ssl and haproxy running with nbthread 3.. > > > results in haproxy hanging with 3 threads for me. > > > > > > This while using both 1.9-7/30 version (with the 2 extra patches from > > > Olivier avoiding 100% on a single thread.) and also a build of today's > > > snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25 > > > > > > Below info is at the bottom of the mail: > > > - haproxy -vv > > > - gdb backtraces > > > > > > This one is easy to reproduce after just a few calls to the lua function > > > with the lua code i'm writing on a test-box.. So if a 'simple' config that > > > makes a reproduction is desired i can likely come up with one. > > > Same lua code with nbthread 1 seems to work properly. > > > > > > Is below info (the stack traces) enough to come up with a fix? If not > > > lemme > > > know and ill try and make a small reproduction of it. > > > > > > > > > root@freebsd11:~ # haproxy -vv > > > HA-Proxy version 1.9-dev1-e3faf02 2018/08/25 > > > Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org> > > > > > > Build options : > > > TARGET = freebsd > > > CPU = generic > > > CC = cc > > > CFLAGS = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector > > > -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement > > > -fwrapv -fno-strict-overflow -Wno-address-of-packed-member > > > -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS > > > OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1 > > > USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1 > > > > > > Default settings : > > > maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 > > > > > > Built with network namespace support. > > > Built with zlib version : 1.2.11 > > > Running on zlib version : 1.2.11 > > > Compression algorithms supported : identity("identity"), > > > deflate("deflate"), > > > raw-deflate("deflate"), gzip("gzip") > > > Built with PCRE version : 8.40 2017-01-11 > > > Running on PCRE version : 8.40 2017-01-11 > > > PCRE library supports JIT : yes > > > Built with multi-threading support. > > > Encrypted password support via crypt(3): yes > > > Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY > > > Built with Lua version : Lua 5.3.4 > > > Built with OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017 > > > Running on OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017 > > > OpenSSL library supports TLS extensions : yes > > > OpenSSL library supports SNI : yes > > > OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 > > > > > > Available polling systems : > > > kqueue : pref=300, test result OK > > > poll : pref=200, test result OK > > > select : pref=150, test result OK > > > Total: 3 (3 usable), will use kqueue. > > > > > > Available multiplexer protocols : > > > (protocols markes as <default> cannot be specified using 'proto' keyword) > > > <default> : mode=TCP|HTTP side=FE|BE > > > h2 : mode=HTTP side=FE > > > > > > Available filters : > > > [TRACE] trace > > > [COMP] compression > > > [SPOE] spoe > > > > > > root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649 > > > GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD] > > > Copyright (C) 2018 Free Software Foundation, Inc. > > > License GPLv3+: GNU GPL version 3 or later > > > <http://gnu.org/licenses/gpl.html> > > > This is free software: you are free to change and redistribute it. > > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > > and "show warranty" for details. > > > This GDB was configured as "x86_64-portbld-freebsd11.1". > > > Type "show configuration" for configuration details. > > > For bug reporting instructions, please see: > > > <http://www.gnu.org/software/gdb/bugs/>. > > > Find the GDB manual and other documentation resources online at: > > > <http://www.gnu.org/software/gdb/documentation/>. > > > For help, type "help". > > > Type "apropos word" to search for commands related to "word". > > > Attaching to process 39649 > > > Reading symbols from /usr/local/sbin/haproxy...done. > > > [New LWP 101651 of process 39649] > > > [New LWP 101652 of process 39649] > > > Reading symbols from /lib/libcrypt.so.5...(no debugging symbols > > > found)...done. > > > Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done. > > > Reading symbols from /lib/libthr.so.3...(no debugging symbols > > > found)...done. > > > Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols > > > found)...done. > > > Reading symbols from /lib/libcrypto.so.8...(no debugging symbols > > > found)...done. > > > Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols > > > found)...done. > > > Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. > > > Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. > > > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols > > > found)...done. > > > [Switching to LWP 101650 of process 39649] > > > 0x0000000801e11e3a in _kevent () from /lib/libc.so.7 > > > (gdb) info thread > > > Id Target Id Frame > > > * 1 LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from > > > /lib/libc.so.7 > > > 2 LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock > > > (lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781 > > > "hlua_ctx_resume", > > > file=0x62a328 "src/hlua.c", line=1070) at > > > include/common/hathreads.h:731 > > > 3 LWP 101652 of process 39649 0x000000080187a70c in ?? () from > > > /usr/local/lib/liblua-5.3.so > > > (gdb) bt full > > > #0 0x0000000801e11e3a in _kevent () from /lib/libc.so.7 > > > No symbol table info available. > > > #1 0x0000000800f15ca2 in ?? () from /lib/libthr.so.3 > > > No symbol table info available. > > > #2 0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, > > > exp=1923638009) > > > at src/ev_kqueue.c:151 > > > status = 37937304 > > > count = 1923638009 > > > fd = 200 > > > delta_ms = 1000 > > > timeout = {tv_sec = 1, tv_nsec = 0} > > > updt_idx = 0 > > > changes = 0 > > > old_fd = -1 > > > #3 0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419 > > > next = 1923638009 > > > exp = 1923638009 > > > #4 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at > > > src/haproxy.c:2451 > > > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > > > last_location = {function = 0x0, file = 0x0, line = 0}}} > > > ptif = 0x8c1980 <per_thread_init_list> > > > ptdf = 0x800f1841e <pthread_sigmask+46> > > > #5 0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at > > > src/haproxy.c:3053 > > > tids = 0x8024849f0 > > > threads = 0x802510ac0 > > > i = 3 > > > old_sig = {__bits = {1073741824, 0, 0, 0}} > > > blocked_sig = {__bits = {4227856759, 4294967295, 4294967295, > > > 4294967295}} > > > err = 0 > > > retry = 200 > > > limit = {rlim_cur = 4033, rlim_max = 4033} > > > errmsg = > > > "\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000 > > > > > > \353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000" > > > pidfd = -1 > > > (gdb) thread 2 > > > [Switching to thread 2 (LWP 101651 of process 39649)] > > > #0 0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8 > > > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328 > > > "src/hlua.c", line=1070) > > > at include/common/hathreads.h:731 > > > 731 __SPIN_LOCK(&l->lock); > > > (gdb) bt full > > > #0 0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8 > > > <hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328 > > > "src/hlua.c", line=1070) > > > at include/common/hathreads.h:731 > > > __pl_l = 0x8cf1d8 <hlua_global_lock> > > > __pl_r = 4294967300 > > > start_time = 7797095086642562 > > > #1 0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, > > > yield_allowed=1) > > > at src/hlua.c:1070 > > > ret = 9240032 > > > msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370" > > > trace = 0x7fffdfffdcb0 "" > > > #2 0x000000000044236d in hlua_process_task (task=0x80242d960, > > > context=0x8024f4500, state=513) at src/hlua.c:5641 > > > hlua = 0x8024f4500 > > > status = HLUA_E_OK > > > #3 0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384 > > > t = 0x80242d960 > > > state = 513 > > > ctx = 0x8024f4500 > > > process = 0x4422f0 <hlua_process_task> > > > t = 0x80242df00 > > > max_processed = 200 > > > #4 0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386 > > > next = 1923609561 > > > exp = 1923608777 > > > #5 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at > > > src/haproxy.c:2451 > > > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > > > last_location = {function = 0x0, file = 0x0, line = 0}}} > > > ptif = 0x8c1980 <per_thread_init_list> > > > ptdf = 0x800f177cc > > > #6 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3 > > > No symbol table info available. > > > #7 0x0000000000000000 in ?? () > > > No symbol table info available. > > > Backtrace stopped: Cannot access memory at address 0x7fffdfffe000 > > > (gdb) thread 3 > > > [Switching to thread 3 (LWP 101652 of process 39649)] > > > #0 0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so > > > (gdb) bt full > > > #0 0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so > > > No symbol table info available. > > > #1 0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so > > > No symbol table info available. > > > #2 0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so > > > No symbol table info available. > > > #3 0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so > > > No symbol table info available. > > > #4 0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, > > > yield_allowed=1) > > > at src/hlua.c:1186 > > > ret = 0 > > > msg = 0x5a8a24 <b_alloc_margin+324> "\351\200" > > > trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177" > > > #5 0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at > > > src/hlua.c:6716 > > > si = 0x802419540 > > > strm = 0x802419200 > > > res = 0x802419270 > > > rule = 0x80242e360 > > > px = 0x802512c00 > > > hlua = 0x8024f4f80 > > > blk1 = 0x7fffdfdfcca0 "" > > > len1 = 34397588737 > > > blk2 = 0x802419278 "" > > > len2 = 34397590136 > > > ret = 0 > > > #6 0x00000000005a78a7 in task_run_applet (t=0x80242efe0, > > > context=0x8024c5880, state=16385) at src/applet.c:49 > > > app = 0x8024c5880 > > > si = 0x802419540 > > > #7 0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384 > > > t = 0x80242efe0 > > > state = 16385 > > > ctx = 0x8024c5880 > > > process = 0x5a77f0 <task_run_applet> > > > t = 0x80242efe0 > > > max_processed = 200 > > > #8 0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386 > > > next = 1923608730 > > > exp = 1923608070 > > > #9 0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at > > > src/haproxy.c:2451 > > > start_lock = {lock = 0, info = {owner = 0, waiters = 0, > > > last_location = {function = 0x0, file = 0x0, line = 0}}} > > > ptif = 0x8c1980 <per_thread_init_list> > > > ptdf = 0x800f177cc > > > ---Type <return> to continue, or q <return> to quit--- > > > #10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3 > > > No symbol table info available. > > > #11 0x0000000000000000 in ?? () > > > No symbol table info available. > > > Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000 > > > (gdb) > > > > > > > > > > I found one minor problem with the LUA code (patch attached) but I kinda > > doubt this is your problem here. > > Interesting patch, because I have noticed that reg-test/lua/b00001.vtc may > sometimes fail (with a haproxy 100% CPU usage) in the TCP mode part (when > using c2 client), both when *thread support is enabled or not.* > > So I think your patch may have fixed the same issue I can try to reproduce. > > Well, I have just tried to reproduce it running a loop of > reg-test/lua/b00001.vtc script, and your patch does not solve it. Obviously, > this does not mean your patch is not correct. >
Hmm this is interesting, but I'm not sure it's the same problem as Pieter', as he specifically said it worked fine with only 1 thread. > > It seems one thread is stuck in lua_gc() while holding the global LUA lock, > > but I don't know enough about LUA to guess what is going on. > > What is suspect is that the HTTP and TCP applet functions > hlua_applet_(http|tcp)_fct() are called several times even when the applet > is done, or when the streams are disconnected or closed: > > /* If the stream is disconnect or closed, ldo nothing. */ > if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO)) > return; > > this leads to call hlua_ctx_resume() several times from the same thread I > guess. > But if hlua_applet_(http|tcp)_fct() just returns, who calls hlua_ctx_resume() ? :) > This may be exposed running reg-tests/lua/b00001.vtc continuously: > > while HAPROXY_PROGRAM=~/src/haproxy/haproxy > ~/src/varnish-cache-trunk/bin/varnishtest/varnishtest -t5 -lv > reg-tests/lua/b00001.vtc; do echo > --------------------------------------------------------------------; done > > Each test must take about 120ms. So, if varnistest stops running this reg > test during about 1s you will have to interrupt it to prevent it to kill > haproxy, and to attach gdb to the haproxy processes. > > Very interesting, time for me to setup varnishtest. Regards, Olivier