On Mon, 16 Jul 2018 08:00:48 +0200 Willy Tarreau <w...@1wt.eu> wrote:
> Hi Thierry, > > On Fri, Jul 06, 2018 at 04:28:22PM +0200, Thierry Fournier wrote: > > Hi list, > > > > I caught a double-free whien I reload haproxy-1.8: > > > > writev(2, [{"*** Error in `", 14}, {"/opt/o3-haproxy/sbin/haproxy", > > 28}, {"': ", 3}, {"double free or corruption (!prev)", 33}, {": 0x", 4}, > > {"000000001cec2ab0", 16}, {" ***\n", 5}], 7) = 103 > > > > Decoded: > > > > *** Error in `/opt/o3-haproxy/sbin/haproxy': double free or corruption > > (!prev): 0x000000001cec2ab0 *** > > > > Gdb says: > > > > #0 0x00007f4bac88b067 in __GI_raise (sig=sig@entry=6) at > > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > > #1 0x00007f4bac88c448 in __GI_abort () at abort.c:89 > > #2 0x00007f4bac8c91b4 in __libc_message (do_abort=do_abort@entry=1, > > fmt=fmt@entry=0x7f4bac9be210 "*** Error in `%s': %s: 0x%s ***\n") > > at ../sysdeps/posix/libc_fatal.c:175 > > #3 0x00007f4bac8ce98e in malloc_printerr (action=1, > > str=0x7f4bac9be318 "double free or corruption (!prev)", > > ptr=<optimized out>) at malloc.c:4996 > > #4 0x00007f4bac8cf696 in _int_free (av=<optimized out>, p=<optimized > > out>, have_lock=0) at malloc.c:3840 > > #5 0x000000000042af56 in ssl_sock_destroy_bind_conf > > (bind_conf=0x1d27e810) at src/ssl_sock.c:4819 > > #6 0x00000000004b1390 in deinit () at src/haproxy.c:2240 > > #7 0x000000000041b83c in main (argc=<optimized out>, > > argv=0x7ffc22f6b4d8) at src/haproxy.c:3094 > > > > I use the last 1.8.12 version. > > This one looks a bit strange. I looked at it a little bit and it corresponds > to the line "free(bind_conf->keys_ref->tlskeys);". Unfortunately, there is no > other line in the code appearing to perfom a free on this element, and when > passing through this code the key_ref is destroyed and properly nulled. I > checked if it was possible for this element not to be allocated and I don't > see how that could happen either. Thus I'm seeing only three possibilities : > > - this element was duplicated and appears at multiple places (multiple list > elements) leading to a real double free > > - there is a memory corruption somewhere possibly resulting in this element > being corrupted and not in fact victim of a double free > > - I can't read code and there is another free that I failed to detect. > > Are you able to trigger this on a trivial config ? Maybe it only happens > when certain features you have in your config are enabled ? Reproduced ! unfortunately, I can't reproduce it without systemd. Check the tls-keys path. With relative path, you must force the start path in the systemd config file, or give the fullpath. The bug seems to be linked with multiple bind line. The followng has no sense, but the bug appens (on by original conf, I use multi process, avec each bind line is associated with one process). Maybe each bind line is duplicated on each process, the tls-key is commun for each lines, and double-free when the second bind try to release memory. I guess that systemd is not a cause of the crash, but if I start the process with -Ws on command line, and I sent kill -USER2, the bug is not trigerred. test.cfg: --------- global frontend frt bind *:443 ssl crt default.pem tls-ticket-keys tls-keys bind *:443 ssl crt default.pem tls-ticket-keys tls-keys tls-keys -------- WRGMXEZMeqZzeY7bJTLsfWvrlBKszxDuZ+2WlSP3YFOqUq4dbzBpH+8nvwforYej b2dwxCxZsV02/8bmEv+q/QjMllu/4bOSCYFWn6CuTtwiQExG8SLYnwBMevOUjVpL cOGgEy6YK4K3h8rS9jSEiu8xWjHP4iMT+IRhHkwYaKPmgwbmvARzvoPkMDnyw5gq /lib/systemd/system/test.service: --------------------------------- [Service] LimitCORE=infinity Environment="PIDFILE=/run/test.pid" WorkingDirectory=/etc/o3-haproxy ExecStart=/opt/o3-haproxy/sbin/haproxy -Ws -f test.cfg ExecReload=/bin/kill -USR2 $MAINPID Thierry -- Thierry Fournier Web Performance & Security Expert m: +33 6 68 69 21 85 | e: thierry.fourn...@ozon.io w: http://www.ozon.io/ | b: http://blog.ozon.io/