Hi Olivier,

Le 29/07/2016 à 17:25, Olivier Doucet a écrit :
hello,

I'm trying to reload haproxy, but the new process failed to start with a
segmentation fault. Fortunately, the old process is still here so no
downtime occurs (just that the config is not reloaded).

I'm using version 1.6.5 (did not update due to the SSL bug, and as I'm
using CentOS+ DH keys, I wanted to wait for the fix in 1.6.7). i'll try
to reproduce it with 1.6.7 on monday (no update on friday, that's the
rule !).

I think this is related to the usage of server-state-file.
I have at least one server DOWN in this file.

Emptying server-state-file file and reloading HAProxy makes it work
correctly.

Unfortunately, config file is very huge and cannot be produced here.

I couldn't spend a lot of time on this issue and failed to find a configuration file that triggers the segfault. But looking at the code, I tend to think that your configuration is using "slowstart" on some servers and that's where it crashes.

in srv_set_running() [server.c], we have :
if (s->slowstart > 0) {
task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
}
but it looks like s->warmup is NULL. Maybe it should be initialized as it is done in the start_checks() function (in checks.c).


Unfortunately, trying to find a configuration file that could reproduce the issue, I found other crash cases :-(

Steps to reproduce :
1. Use the following configuration file :
global
    stats socket /tmp/haproxy.sock level admin
    server-state-file ./haproxy.state

defaults
    load-server-state-from-file global

backend test0
  server s0 127.0.0.1:80

2. run haproxy with it.
3. save the server state file :
$ echo "show servers state"|socat stdio /tmp/haproxy.sock > ./haproxy.state

4. stop haproxy and add the keyword "disabled" on the backend "test"0 :
global
    stats socket /tmp/haproxy.sock level admin
    server-state-file ./haproxy.state

defaults
    load-server-state-from-file global

backend test0
  disabled
  server s0 127.0.0.1:80

5. start haproxy
=> It will die with SIGFPE
This happens because a disabled proxy it not initialized and variables used to divide weights are set to 0 : 0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0) at src/server.c:750 750 sv->eweight = (sv->uweight * w + px->lbprm.wmult - 1) / px->lbprm.wmult;
(gdb) bt
#0 0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0) at src/server.c:750 #1 0x000000000043fd97 in srv_update_state (params=0x7fffffffbbd0, version=1, srv=0x7b7bd0) at src/server.c:2141
#2  apply_server_state () at src/server.c:2473
#3 0x0000000000416b0e in init (argc=<optimized out>, argv=<optimized out>, argv@entry=0x7fffffffe128) at src/haproxy.c:839 #4 0x0000000000414989 in main (argc=<optimized out>, argv=0x7fffffffe128) at src/haproxy.c:1635




Main informations :
 nbcores 7
 server-state-file /tmp/haproxy_server_state

HAProxy -vv output:
HA-Proxy version 1.6.5 2016/05/10
Copyright 2000-2016 Willy Tarreau <[email protected]
<mailto:[email protected]>>

Build options :
  TARGET  = linux2628
  CPU     = native
  CC      = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
-Wdeclaration-after-statement
  OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
Compression algorithms supported : identity("identity")
Built with OpenSSL version : OpenSSL 1.0.2h  3 May 2016
Running on OpenSSL version : OpenSSL 1.0.2h  3 May 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.2 2007-06-19
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.




As I am using nbcores > 1, the server state file is built like this :
    > /tmp/haproxy_server_state
    for i in $(/bin/ls /var/run/haproxy-*.sock); do
        socat $i - <<< "show servers state" >> /tmp/haproxy_server_state
    done

gdb infos :

Program terminated with signal 11, Segmentation fault.
#0  srv_set_running () at include/proto/task.h:244
#1  0x000000000042ee3a in apply_server_state () at src/server.c:2069
#2  0x0000000000405618 in init () at src/haproxy.c:839
#3  0x0000000000407049 in main () at src/haproxy.c:1635

bt full gives this :

#0  srv_set_running () at include/proto/task.h:244
        srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
0x8b90e8}, kw = 0x8b12a0}
        srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
0x8b12a8}, kw = 0x8b1250}
#1  0x000000000042ee3a in apply_server_state () at src/server.c:2069
        srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
0x8b90e8}, kw = 0x8b12a0}
        srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
0x8b12a8}, kw = 0x8b1250}
#2  0x0000000000405618 in init () at src/haproxy.c:839
        [...]
        trash = {
          str = 0x23b1910 "Server OBFUSCATED:80/OBFUSCATED is DOWN,
changed from server-state after a reload. 1 active and 0 backup servers
left. 0 sessions active, 0 requeued, 0 remaining in queue", size =
16384, len = 168}
        [...]

I can provide for the coredump in private with my haproxy binary if it
helps.



--
Cyril Bonté

Reply via email to