Hi,

2016-08-01 23:20 GMT+02:00 Cyril Bonté <[email protected]>:

> Hi again Olivier,
>
> (I re-add the mailing that you skipped in your reply)


Damn, hit the wrong button ... sorry !



>
>>
>>     Hi Olivier,
>>
>>     Le 29/07/2016 à 17:25, Olivier Doucet a écrit :
>>
>>         hello,
>>
>>         I'm trying to reload haproxy, but the new process failed to
>>         start with a
>>         segmentation fault. Fortunately, the old process is still here so
>> no
>>         downtime occurs (just that the config is not reloaded).
>>
>>         I'm using version 1.6.5 (did not update due to the SSL bug, and
>>         as I'm
>>         using CentOS+ DH keys, I wanted to wait for the fix in 1.6.7).
>>         i'll try
>>         to reproduce it with 1.6.7 on monday (no update on friday,
>>         that's the
>>         rule !).
>>
>>         I think this is related to the usage of server-state-file.
>>         I have at least one server DOWN in this file.
>>
>>         Emptying server-state-file file and reloading HAProxy makes it
>> work
>>         correctly.
>>
>>         Unfortunately, config file is very huge and cannot be produced
>> here.
>>
>>
>>     I couldn't spend a lot of time on this issue and failed to find a
>>     configuration file that triggers the segfault.
>>
>>
>> Me too. I tried to reproduce it today with a smaller configfile with no
>> success.
>>
>>
>>
>>     But looking at the code, I tend to think that your configuration is
>>     using "slowstart" on some servers and that's where it crashes.
>>
>>
>> I confirm you I have the following line in defaults section :
>>
>>  default-server maxconn 2000 fall 3 rise 1 inter 2500ms  fastinter
>> 1000ms  downinter 5000ms slowstart 30s
>>
>
> Can you check your configuration and verify that you don't have duplicate
> server names in the same backends ? (Or can you provide me your state file,
> in private this time ;-) ).
>

If I remember correctly, it happens when I removed a server and add another
one. But I'm 100% sure that they had different names. I'm sure because I
always get warnings from reload, because ID did not match (BTW, it would be
great to find a way to set id on server to avoid this).

Will get back to you tomorrow with a working example I hope. But you
already found two ways to get segfault, not so bad ;)

Olivier








>
> For example, this configuration will segfault :
> global
>   stats socket /tmp/haproxy.sock level admin
>   server-state-file ./haproxy.state
>
> defaults
>   load-server-state-from-file global
>   default-server maxconn 2000 fall 3 rise 1 inter 2500ms  fastinter
> 1000ms  downinter 5000ms slowstart 30s
>
> backend test0
>   server s0 127.0.0.1:81 check  # DOWN
>   server s0 127.0.0.1:80 check  # UP
>
> And the state file contains (I remove the headers) :
> 2 test0 1 s0 127.0.0.1 0 0 1 1 5 8 2 0 6 0 0 0
> 2 test0 2 s0 127.0.0.1 2 0 1 1 5 6 3 3 6 0 0 0
>
>
> First, haproxy will set the server DOWN, then it will find a second line
> setting it to UP, which make srv_set_running() enter the code producing the
> segfault.
>
>
>> For the new bug you found, that's great.
>> Tomorrow I'll investigate more to find a way to reproduce my bug.
>>
>> Olivier
>>
>>
>>
>>
>>     in srv_set_running() [server.c], we have :
>>     if (s->slowstart > 0) {
>>             task_schedule(s->warmup, tick_add(now_ms,
>>     MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
>>     }
>>     but it looks like s->warmup is NULL. Maybe it should be initialized
>>     as it is done in the start_checks() function (in checks.c).
>>
>>
>>     Unfortunately, trying to find a configuration file that could
>>     reproduce the issue, I found other crash cases :-(
>>
>>     Steps to reproduce :
>>     1. Use the following configuration file :
>>     global
>>         stats socket /tmp/haproxy.sock level admin
>>         server-state-file ./haproxy.state
>>
>>     defaults
>>         load-server-state-from-file global
>>
>>     backend test0
>>       server s0 127.0.0.1:80 <http://127.0.0.1:80>
>>
>>     2. run haproxy with it.
>>     3. save the server state file :
>>     $ echo "show servers state"|socat stdio /tmp/haproxy.sock >
>>     ./haproxy.state
>>
>>     4. stop haproxy and add the keyword "disabled" on the backend "test"0
>> :
>>     global
>>         stats socket /tmp/haproxy.sock level admin
>>         server-state-file ./haproxy.state
>>
>>     defaults
>>         load-server-state-from-file global
>>
>>     backend test0
>>       disabled
>>       server s0 127.0.0.1:80 <http://127.0.0.1:80>
>>
>>
>>     5. start haproxy
>>     => It will die with SIGFPE
>>     This happens because a disabled proxy it not initialized and
>>     variables used to divide weights are set to 0 :
>>     0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0)
>>     at src/server.c:750
>>     750             sv->eweight = (sv->uweight * w + px->lbprm.wmult -
>>     1) / px->lbprm.wmult;
>>     (gdb) bt
>>     #0  0x000000000043c53a in server_recalc_eweight
>>     (sv=sv@entry=0x7b7bd0) at src/server.c:750
>>     #1  0x000000000043fd97 in srv_update_state (params=0x7fffffffbbd0,
>>     version=1, srv=0x7b7bd0) at src/server.c:2141
>>     #2  apply_server_state () at src/server.c:2473
>>     #3  0x0000000000416b0e in init (argc=<optimized out>,
>>     argv=<optimized out>, argv@entry=0x7fffffffe128) at src/haproxy.c:839
>>     #4  0x0000000000414989 in main (argc=<optimized out>,
>>     argv=0x7fffffffe128) at src/haproxy.c:1635
>>
>>
>>
>>
>>         Main informations :
>>          nbcores 7
>>          server-state-file /tmp/haproxy_server_state
>>
>>         HAProxy -vv output:
>>         HA-Proxy version 1.6.5 2016/05/10
>>         Copyright 2000-2016 Willy Tarreau <[email protected]
>>         <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>>
>>
>>
>>
>>         Build options :
>>           TARGET  = linux2628
>>           CPU     = native
>>           CC      = gcc
>>           CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
>>         -Wdeclaration-after-statement
>>           OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1
>>
>>         Default settings :
>>           maxconn = 2000, bufsize = 16384, maxrewrite = 1024,
>>         maxpollevents = 200
>>
>>         Encrypted password support via crypt(3): yes
>>         Built without compression support (neither USE_ZLIB nor USE_SLZ
>>         are set)
>>         Compression algorithms supported : identity("identity")
>>         Built with OpenSSL version : OpenSSL 1.0.2h  3 May 2016
>>         Running on OpenSSL version : OpenSSL 1.0.2h  3 May 2016
>>         OpenSSL library supports TLS extensions : yes
>>         OpenSSL library supports SNI : yes
>>         OpenSSL library supports prefer-server-ciphers : yes
>>         Built with PCRE version : 7.2 2007-06-19
>>         PCRE library supports JIT : no (USE_PCRE_JIT not set)
>>         Built without Lua support
>>         Built with transparent proxy support using: IP_TRANSPARENT
>>         IPV6_TRANSPARENT IP_FREEBIND
>>
>>         Available polling systems :
>>               epoll : pref=300,  test result OK
>>                poll : pref=200,  test result OK
>>              select : pref=150,  test result OK
>>         Total: 3 (3 usable), will use epoll.
>>
>>
>>
>>
>>         As I am using nbcores > 1, the server state file is built like
>>         this :
>>             > /tmp/haproxy_server_state
>>             for i in $(/bin/ls /var/run/haproxy-*.sock); do
>>                 socat $i - <<< "show servers state" >>
>>         /tmp/haproxy_server_state
>>             done
>>
>>         gdb infos :
>>
>>         Program terminated with signal 11, Segmentation fault.
>>         #0  srv_set_running () at include/proto/task.h:244
>>         #1  0x000000000042ee3a in apply_server_state () at
>> src/server.c:2069
>>         #2  0x0000000000405618 in init () at src/haproxy.c:839
>>         #3  0x0000000000407049 in main () at src/haproxy.c:1635
>>
>>         bt full gives this :
>>
>>         #0  srv_set_running () at include/proto/task.h:244
>>                 srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
>>         0x8b90e8}, kw = 0x8b12a0}
>>                 srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68,
>> p =
>>         0x8b12a8}, kw = 0x8b1250}
>>         #1  0x000000000042ee3a in apply_server_state () at
>> src/server.c:2069
>>                 srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
>>         0x8b90e8}, kw = 0x8b12a0}
>>                 srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68,
>> p =
>>         0x8b12a8}, kw = 0x8b1250}
>>         #2  0x0000000000405618 in init () at src/haproxy.c:839
>>                 [...]
>>                 trash = {
>>                   str = 0x23b1910 "Server OBFUSCATED:80/OBFUSCATED is
>> DOWN,
>>         changed from server-state after a reload. 1 active and 0 backup
>>         servers
>>         left. 0 sessions active, 0 requeued, 0 remaining in queue", size =
>>         16384, len = 168}
>>                 [...]
>>
>>         I can provide for the coredump in private with my haproxy binary
>>         if it
>>         helps.
>>
>>     --
>>     Cyril Bonté
>>
>>
>>
>>
>
> --
> Cyril Bonté
>

Reply via email to