Hi again Olivier,

(I re-add the mailing that you skipped in your reply)

Le 01/08/2016 à 22:48, Olivier Doucet a écrit :
Hello Cyril,



2016-08-01 22:36 GMT+02:00 Cyril Bonté <[email protected]
<mailto:[email protected]>>:

    Hi Olivier,

    Le 29/07/2016 à 17:25, Olivier Doucet a écrit :

        hello,

        I'm trying to reload haproxy, but the new process failed to
        start with a
        segmentation fault. Fortunately, the old process is still here so no
        downtime occurs (just that the config is not reloaded).

        I'm using version 1.6.5 (did not update due to the SSL bug, and
        as I'm
        using CentOS+ DH keys, I wanted to wait for the fix in 1.6.7).
        i'll try
        to reproduce it with 1.6.7 on monday (no update on friday,
        that's the
        rule !).

        I think this is related to the usage of server-state-file.
        I have at least one server DOWN in this file.

        Emptying server-state-file file and reloading HAProxy makes it work
        correctly.

        Unfortunately, config file is very huge and cannot be produced here.


    I couldn't spend a lot of time on this issue and failed to find a
    configuration file that triggers the segfault.


Me too. I tried to reproduce it today with a smaller configfile with no
success.



    But looking at the code, I tend to think that your configuration is
    using "slowstart" on some servers and that's where it crashes.


I confirm you I have the following line in defaults section :

 default-server maxconn 2000 fall 3 rise 1 inter 2500ms  fastinter
1000ms  downinter 5000ms slowstart 30s

Can you check your configuration and verify that you don't have duplicate server names in the same backends ? (Or can you provide me your state file, in private this time ;-) ).

For example, this configuration will segfault :
global
  stats socket /tmp/haproxy.sock level admin
  server-state-file ./haproxy.state

defaults
  load-server-state-from-file global
default-server maxconn 2000 fall 3 rise 1 inter 2500ms fastinter 1000ms downinter 5000ms slowstart 30s

backend test0
  server s0 127.0.0.1:81 check  # DOWN
  server s0 127.0.0.1:80 check  # UP

And the state file contains (I remove the headers) :
2 test0 1 s0 127.0.0.1 0 0 1 1 5 8 2 0 6 0 0 0
2 test0 2 s0 127.0.0.1 2 0 1 1 5 6 3 3 6 0 0 0


First, haproxy will set the server DOWN, then it will find a second line setting it to UP, which make srv_set_running() enter the code producing the segfault.


For the new bug you found, that's great.
Tomorrow I'll investigate more to find a way to reproduce my bug.

Olivier




    in srv_set_running() [server.c], we have :
    if (s->slowstart > 0) {
            task_schedule(s->warmup, tick_add(now_ms,
    MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
    }
    but it looks like s->warmup is NULL. Maybe it should be initialized
    as it is done in the start_checks() function (in checks.c).


    Unfortunately, trying to find a configuration file that could
    reproduce the issue, I found other crash cases :-(

    Steps to reproduce :
    1. Use the following configuration file :
    global
        stats socket /tmp/haproxy.sock level admin
        server-state-file ./haproxy.state

    defaults
        load-server-state-from-file global

    backend test0
      server s0 127.0.0.1:80 <http://127.0.0.1:80>

    2. run haproxy with it.
    3. save the server state file :
    $ echo "show servers state"|socat stdio /tmp/haproxy.sock >
    ./haproxy.state

    4. stop haproxy and add the keyword "disabled" on the backend "test"0 :
    global
        stats socket /tmp/haproxy.sock level admin
        server-state-file ./haproxy.state

    defaults
        load-server-state-from-file global

    backend test0
      disabled
      server s0 127.0.0.1:80 <http://127.0.0.1:80>

    5. start haproxy
    => It will die with SIGFPE
    This happens because a disabled proxy it not initialized and
    variables used to divide weights are set to 0 :
    0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0)
    at src/server.c:750
    750             sv->eweight = (sv->uweight * w + px->lbprm.wmult -
    1) / px->lbprm.wmult;
    (gdb) bt
    #0  0x000000000043c53a in server_recalc_eweight
    (sv=sv@entry=0x7b7bd0) at src/server.c:750
    #1  0x000000000043fd97 in srv_update_state (params=0x7fffffffbbd0,
    version=1, srv=0x7b7bd0) at src/server.c:2141
    #2  apply_server_state () at src/server.c:2473
    #3  0x0000000000416b0e in init (argc=<optimized out>,
    argv=<optimized out>, argv@entry=0x7fffffffe128) at src/haproxy.c:839
    #4  0x0000000000414989 in main (argc=<optimized out>,
    argv=0x7fffffffe128) at src/haproxy.c:1635




        Main informations :
         nbcores 7
         server-state-file /tmp/haproxy_server_state

        HAProxy -vv output:
        HA-Proxy version 1.6.5 2016/05/10
        Copyright 2000-2016 Willy Tarreau <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>


        Build options :
          TARGET  = linux2628
          CPU     = native
          CC      = gcc
          CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
        -Wdeclaration-after-statement
          OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1

        Default settings :
          maxconn = 2000, bufsize = 16384, maxrewrite = 1024,
        maxpollevents = 200

        Encrypted password support via crypt(3): yes
        Built without compression support (neither USE_ZLIB nor USE_SLZ
        are set)
        Compression algorithms supported : identity("identity")
        Built with OpenSSL version : OpenSSL 1.0.2h  3 May 2016
        Running on OpenSSL version : OpenSSL 1.0.2h  3 May 2016
        OpenSSL library supports TLS extensions : yes
        OpenSSL library supports SNI : yes
        OpenSSL library supports prefer-server-ciphers : yes
        Built with PCRE version : 7.2 2007-06-19
        PCRE library supports JIT : no (USE_PCRE_JIT not set)
        Built without Lua support
        Built with transparent proxy support using: IP_TRANSPARENT
        IPV6_TRANSPARENT IP_FREEBIND

        Available polling systems :
              epoll : pref=300,  test result OK
               poll : pref=200,  test result OK
             select : pref=150,  test result OK
        Total: 3 (3 usable), will use epoll.




        As I am using nbcores > 1, the server state file is built like
        this :
            > /tmp/haproxy_server_state
            for i in $(/bin/ls /var/run/haproxy-*.sock); do
                socat $i - <<< "show servers state" >>
        /tmp/haproxy_server_state
            done

        gdb infos :

        Program terminated with signal 11, Segmentation fault.
        #0  srv_set_running () at include/proto/task.h:244
        #1  0x000000000042ee3a in apply_server_state () at src/server.c:2069
        #2  0x0000000000405618 in init () at src/haproxy.c:839
        #3  0x0000000000407049 in main () at src/haproxy.c:1635

        bt full gives this :

        #0  srv_set_running () at include/proto/task.h:244
                srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
        0x8b90e8}, kw = 0x8b12a0}
                srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
        0x8b12a8}, kw = 0x8b1250}
        #1  0x000000000042ee3a in apply_server_state () at src/server.c:2069
                srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
        0x8b90e8}, kw = 0x8b12a0}
                srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
        0x8b12a8}, kw = 0x8b1250}
        #2  0x0000000000405618 in init () at src/haproxy.c:839
                [...]
                trash = {
                  str = 0x23b1910 "Server OBFUSCATED:80/OBFUSCATED is DOWN,
        changed from server-state after a reload. 1 active and 0 backup
        servers
        left. 0 sessions active, 0 requeued, 0 remaining in queue", size =
        16384, len = 168}
                [...]

        I can provide for the coredump in private with my haproxy binary
        if it
        helps.

    --
    Cyril Bonté





--
Cyril Bonté

Reply via email to