On 08/27/2018 01:33 PM, Olivier Houchard wrote:
Hi Pieter,

On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote:
Hi List, Thierry, Olivier,

Using a lua-socket with connect_ssl and haproxy running with nbthread 3..
results in haproxy hanging with 3 threads for me.

This while using both 1.9-7/30 version (with the 2 extra patches from
Olivier avoiding 100% on a single thread.) and also a build of today's
snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

Below info is at the bottom of the mail:
- haproxy -vv
- gdb backtraces

This one is easy to reproduce after just a few calls to the lua function
with the lua code i'm writing on a test-box.. So if a 'simple' config that
makes a reproduction is desired i can likely come up with one.
Same lua code with nbthread 1 seems to work properly.

Is below info (the stack traces) enough to come up with a fix? If not lemme
know and ill try and make a small reproduction of it.


root@freebsd11:~ # haproxy -vv
HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
   TARGET  = freebsd
   CPU     = generic
   CC      = cc
   CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector
-fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-address-of-packed-member
-Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
   OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1
USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1

Default settings :
   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"),
raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
      kqueue : pref=300,  test result OK
        poll : pref=200,  test result OK
      select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols markes as <default> cannot be specified using 'proto' keyword)
        <default> : mode=TCP|HTTP   side=FE|BE
               h2 : mode=HTTP       side=FE

Available filters :
         [TRACE] trace
         [COMP] compression
         [SPOE] spoe

root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 39649
Reading symbols from /usr/local/sbin/haproxy...done.
[New LWP 101651 of process 39649]
[New LWP 101652 of process 39649]
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols
found)...done.
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols
found)...done.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
found)...done.
[Switching to LWP 101650 of process 39649]
0x0000000801e11e3a in _kevent () from /lib/libc.so.7
(gdb) info thread
   Id   Target Id         Frame
* 1    LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from
/lib/libc.so.7
   2    LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock
(lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781
"hlua_ctx_resume",
     file=0x62a328 "src/hlua.c", line=1070) at include/common/hathreads.h:731
   3    LWP 101652 of process 39649 0x000000080187a70c in ?? () from
/usr/local/lib/liblua-5.3.so
(gdb) bt full
#0  0x0000000801e11e3a in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x0000000800f15ca2 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, exp=1923638009)
at src/ev_kqueue.c:151
         status = 37937304
         count = 1923638009
         fd = 200
         delta_ms = 1000
         timeout = {tv_sec = 1, tv_nsec = 0}
         updt_idx = 0
         changes = 0
         old_fd = -1
#3  0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419
         next = 1923638009
         exp = 1923638009
#4  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at
src/haproxy.c:2451
         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
         ptif = 0x8c1980 <per_thread_init_list>
         ptdf = 0x800f1841e <pthread_sigmask+46>
#5  0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at
src/haproxy.c:3053
         tids = 0x8024849f0
         threads = 0x802510ac0
         i = 3
         old_sig = {__bits = {1073741824, 0, 0, 0}}
         blocked_sig = {__bits = {4227856759, 4294967295, 4294967295,
4294967295}}
         err = 0
         retry = 200
         limit = {rlim_cur = 4033, rlim_max = 4033}
         errmsg = 
"\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000
 
\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000"
         pidfd = -1
(gdb) thread 2
[Switching to thread 2 (LWP 101651 of process 39649)]
#0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
<hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
"src/hlua.c", line=1070)
     at include/common/hathreads.h:731
731             __SPIN_LOCK(&l->lock);
(gdb) bt full
#0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
<hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
"src/hlua.c", line=1070)
     at include/common/hathreads.h:731
         __pl_l = 0x8cf1d8 <hlua_global_lock>
         __pl_r = 4294967300
         start_time = 7797095086642562
#1  0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, yield_allowed=1)
at src/hlua.c:1070
         ret = 9240032
         msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370"
         trace = 0x7fffdfffdcb0 ""
#2  0x000000000044236d in hlua_process_task (task=0x80242d960,
context=0x8024f4500, state=513) at src/hlua.c:5641
         hlua = 0x8024f4500
         status = HLUA_E_OK
#3  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
         t = 0x80242d960
         state = 513
         ctx = 0x8024f4500
         process = 0x4422f0 <hlua_process_task>
         t = 0x80242df00
         max_processed = 200
#4  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
         next = 1923609561
         exp = 1923608777
#5  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at
src/haproxy.c:2451
         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
         ptif = 0x8c1980 <per_thread_init_list>
         ptdf = 0x800f177cc
#6  0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
(gdb) thread 3
[Switching to thread 3 (LWP 101652 of process 39649)]
#0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
(gdb) bt full
#0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#1  0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#2  0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#3  0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#4  0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, yield_allowed=1)
at src/hlua.c:1186
         ret = 0
         msg = 0x5a8a24 <b_alloc_margin+324> "\351\200"
         trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177"
#5  0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at
src/hlua.c:6716
         si = 0x802419540
         strm = 0x802419200
         res = 0x802419270
         rule = 0x80242e360
         px = 0x802512c00
         hlua = 0x8024f4f80
         blk1 = 0x7fffdfdfcca0 ""
         len1 = 34397588737
         blk2 = 0x802419278 ""
         len2 = 34397590136
         ret = 0
#6  0x00000000005a78a7 in task_run_applet (t=0x80242efe0,
context=0x8024c5880, state=16385) at src/applet.c:49
         app = 0x8024c5880
         si = 0x802419540
#7  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
         t = 0x80242efe0
         state = 16385
         ctx = 0x8024c5880
         process = 0x5a77f0 <task_run_applet>
         t = 0x80242efe0
         max_processed = 200
#8  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
         next = 1923608730
         exp = 1923608070
#9  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at
src/haproxy.c:2451
         start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
         ptif = 0x8c1980 <per_thread_init_list>
         ptdf = 0x800f177cc
---Type <return> to continue, or q <return> to quit---
#10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
No symbol table info available.
#11 0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
(gdb)



I found one minor problem with the LUA code (patch attached) but I kinda
doubt this is your problem here.

Interesting patch, because I have noticed that reg-test/lua/b00001.vtc may sometimes fail (with a haproxy 100% CPU usage) in the TCP mode part (when using c2 client), both when *thread support is enabled or not.*

So I think your patch may have fixed the same issue I can try to reproduce.

Well, I have just tried to reproduce it running a loop of reg-test/lua/b00001.vtc script, and your patch does not solve it. Obviously, this does not mean your patch is not correct.

It seems one thread is stuck in lua_gc() while holding the global LUA lock,
but I don't know enough about LUA to guess what is going on.

What is suspect is that the HTTP and TCP applet functions hlua_applet_(http|tcp)_fct() are called several times even when the applet is done, or when the streams are disconnected or closed:

 /* If the stream is disconnect or closed, ldo nothing. */
    if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO))
        return;

this leads to call hlua_ctx_resume() several times from the same thread I guess.

This may be exposed running reg-tests/lua/b00001.vtc continuously:

while HAPROXY_PROGRAM=~/src/haproxy/haproxy ~/src/varnish-cache-trunk/bin/varnishtest/varnishtest -t5 -lv reg-tests/lua/b00001.vtc; do echo --------------------------------------------------------------------; done

Each test must take about 120ms. So, if varnistest stops running this reg test during about 1s you will have to interrupt it to prevent it to kill haproxy, and to attach gdb to the haproxy processes.


As Fred said, it would be really nice to have a sample configuration to
reproduce the problem.

Regards,

Olivier



Reply via email to