Hi,
2016-08-01 23:20 GMT+02:00 Cyril Bonté <[email protected]>: > Hi again Olivier, > > (I re-add the mailing that you skipped in your reply) Damn, hit the wrong button ... sorry ! > >> >> Hi Olivier, >> >> Le 29/07/2016 à 17:25, Olivier Doucet a écrit : >> >> hello, >> >> I'm trying to reload haproxy, but the new process failed to >> start with a >> segmentation fault. Fortunately, the old process is still here so >> no >> downtime occurs (just that the config is not reloaded). >> >> I'm using version 1.6.5 (did not update due to the SSL bug, and >> as I'm >> using CentOS+ DH keys, I wanted to wait for the fix in 1.6.7). >> i'll try >> to reproduce it with 1.6.7 on monday (no update on friday, >> that's the >> rule !). >> >> I think this is related to the usage of server-state-file. >> I have at least one server DOWN in this file. >> >> Emptying server-state-file file and reloading HAProxy makes it >> work >> correctly. >> >> Unfortunately, config file is very huge and cannot be produced >> here. >> >> >> I couldn't spend a lot of time on this issue and failed to find a >> configuration file that triggers the segfault. >> >> >> Me too. I tried to reproduce it today with a smaller configfile with no >> success. >> >> >> >> But looking at the code, I tend to think that your configuration is >> using "slowstart" on some servers and that's where it crashes. >> >> >> I confirm you I have the following line in defaults section : >> >> default-server maxconn 2000 fall 3 rise 1 inter 2500ms fastinter >> 1000ms downinter 5000ms slowstart 30s >> > > Can you check your configuration and verify that you don't have duplicate > server names in the same backends ? (Or can you provide me your state file, > in private this time ;-) ). > If I remember correctly, it happens when I removed a server and add another one. But I'm 100% sure that they had different names. I'm sure because I always get warnings from reload, because ID did not match (BTW, it would be great to find a way to set id on server to avoid this). Will get back to you tomorrow with a working example I hope. But you already found two ways to get segfault, not so bad ;) Olivier > > For example, this configuration will segfault : > global > stats socket /tmp/haproxy.sock level admin > server-state-file ./haproxy.state > > defaults > load-server-state-from-file global > default-server maxconn 2000 fall 3 rise 1 inter 2500ms fastinter > 1000ms downinter 5000ms slowstart 30s > > backend test0 > server s0 127.0.0.1:81 check # DOWN > server s0 127.0.0.1:80 check # UP > > And the state file contains (I remove the headers) : > 2 test0 1 s0 127.0.0.1 0 0 1 1 5 8 2 0 6 0 0 0 > 2 test0 2 s0 127.0.0.1 2 0 1 1 5 6 3 3 6 0 0 0 > > > First, haproxy will set the server DOWN, then it will find a second line > setting it to UP, which make srv_set_running() enter the code producing the > segfault. > > >> For the new bug you found, that's great. >> Tomorrow I'll investigate more to find a way to reproduce my bug. >> >> Olivier >> >> >> >> >> in srv_set_running() [server.c], we have : >> if (s->slowstart > 0) { >> task_schedule(s->warmup, tick_add(now_ms, >> MS_TO_TICKS(MAX(1000, s->slowstart / 20)))); >> } >> but it looks like s->warmup is NULL. Maybe it should be initialized >> as it is done in the start_checks() function (in checks.c). >> >> >> Unfortunately, trying to find a configuration file that could >> reproduce the issue, I found other crash cases :-( >> >> Steps to reproduce : >> 1. Use the following configuration file : >> global >> stats socket /tmp/haproxy.sock level admin >> server-state-file ./haproxy.state >> >> defaults >> load-server-state-from-file global >> >> backend test0 >> server s0 127.0.0.1:80 <http://127.0.0.1:80> >> >> 2. run haproxy with it. >> 3. save the server state file : >> $ echo "show servers state"|socat stdio /tmp/haproxy.sock > >> ./haproxy.state >> >> 4. stop haproxy and add the keyword "disabled" on the backend "test"0 >> : >> global >> stats socket /tmp/haproxy.sock level admin >> server-state-file ./haproxy.state >> >> defaults >> load-server-state-from-file global >> >> backend test0 >> disabled >> server s0 127.0.0.1:80 <http://127.0.0.1:80> >> >> >> 5. start haproxy >> => It will die with SIGFPE >> This happens because a disabled proxy it not initialized and >> variables used to divide weights are set to 0 : >> 0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0) >> at src/server.c:750 >> 750 sv->eweight = (sv->uweight * w + px->lbprm.wmult - >> 1) / px->lbprm.wmult; >> (gdb) bt >> #0 0x000000000043c53a in server_recalc_eweight >> (sv=sv@entry=0x7b7bd0) at src/server.c:750 >> #1 0x000000000043fd97 in srv_update_state (params=0x7fffffffbbd0, >> version=1, srv=0x7b7bd0) at src/server.c:2141 >> #2 apply_server_state () at src/server.c:2473 >> #3 0x0000000000416b0e in init (argc=<optimized out>, >> argv=<optimized out>, argv@entry=0x7fffffffe128) at src/haproxy.c:839 >> #4 0x0000000000414989 in main (argc=<optimized out>, >> argv=0x7fffffffe128) at src/haproxy.c:1635 >> >> >> >> >> Main informations : >> nbcores 7 >> server-state-file /tmp/haproxy_server_state >> >> HAProxy -vv output: >> HA-Proxy version 1.6.5 2016/05/10 >> Copyright 2000-2016 Willy Tarreau <[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> >> >> Build options : >> TARGET = linux2628 >> CPU = native >> CC = gcc >> CFLAGS = -O2 -march=native -g -fno-strict-aliasing >> -Wdeclaration-after-statement >> OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1 >> >> Default settings : >> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, >> maxpollevents = 200 >> >> Encrypted password support via crypt(3): yes >> Built without compression support (neither USE_ZLIB nor USE_SLZ >> are set) >> Compression algorithms supported : identity("identity") >> Built with OpenSSL version : OpenSSL 1.0.2h 3 May 2016 >> Running on OpenSSL version : OpenSSL 1.0.2h 3 May 2016 >> OpenSSL library supports TLS extensions : yes >> OpenSSL library supports SNI : yes >> OpenSSL library supports prefer-server-ciphers : yes >> Built with PCRE version : 7.2 2007-06-19 >> PCRE library supports JIT : no (USE_PCRE_JIT not set) >> Built without Lua support >> Built with transparent proxy support using: IP_TRANSPARENT >> IPV6_TRANSPARENT IP_FREEBIND >> >> Available polling systems : >> epoll : pref=300, test result OK >> poll : pref=200, test result OK >> select : pref=150, test result OK >> Total: 3 (3 usable), will use epoll. >> >> >> >> >> As I am using nbcores > 1, the server state file is built like >> this : >> > /tmp/haproxy_server_state >> for i in $(/bin/ls /var/run/haproxy-*.sock); do >> socat $i - <<< "show servers state" >> >> /tmp/haproxy_server_state >> done >> >> gdb infos : >> >> Program terminated with signal 11, Segmentation fault. >> #0 srv_set_running () at include/proto/task.h:244 >> #1 0x000000000042ee3a in apply_server_state () at >> src/server.c:2069 >> #2 0x0000000000405618 in init () at src/haproxy.c:839 >> #3 0x0000000000407049 in main () at src/haproxy.c:1635 >> >> bt full gives this : >> >> #0 srv_set_running () at include/proto/task.h:244 >> srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p = >> 0x8b90e8}, kw = 0x8b12a0} >> srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, >> p = >> 0x8b12a8}, kw = 0x8b1250} >> #1 0x000000000042ee3a in apply_server_state () at >> src/server.c:2069 >> srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p = >> 0x8b90e8}, kw = 0x8b12a0} >> srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, >> p = >> 0x8b12a8}, kw = 0x8b1250} >> #2 0x0000000000405618 in init () at src/haproxy.c:839 >> [...] >> trash = { >> str = 0x23b1910 "Server OBFUSCATED:80/OBFUSCATED is >> DOWN, >> changed from server-state after a reload. 1 active and 0 backup >> servers >> left. 0 sessions active, 0 requeued, 0 remaining in queue", size = >> 16384, len = 168} >> [...] >> >> I can provide for the coredump in private with my haproxy binary >> if it >> helps. >> >> -- >> Cyril Bonté >> >> >> >> > > -- > Cyril Bonté >

