Hi Olivier,
Le 29/07/2016 à 17:25, Olivier Doucet a écrit :
hello,
I'm trying to reload haproxy, but the new process failed to start with a
segmentation fault. Fortunately, the old process is still here so no
downtime occurs (just that the config is not reloaded).
I'm using version 1.6.5 (did not update due to the SSL bug, and as I'm
using CentOS+ DH keys, I wanted to wait for the fix in 1.6.7). i'll try
to reproduce it with 1.6.7 on monday (no update on friday, that's the
rule !).
I think this is related to the usage of server-state-file.
I have at least one server DOWN in this file.
Emptying server-state-file file and reloading HAProxy makes it work
correctly.
Unfortunately, config file is very huge and cannot be produced here.
I couldn't spend a lot of time on this issue and failed to find a
configuration file that triggers the segfault.
But looking at the code, I tend to think that your configuration is
using "slowstart" on some servers and that's where it crashes.
in srv_set_running() [server.c], we have :
if (s->slowstart > 0) {
task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000,
s->slowstart / 20))));
}
but it looks like s->warmup is NULL. Maybe it should be initialized as
it is done in the start_checks() function (in checks.c).
Unfortunately, trying to find a configuration file that could reproduce
the issue, I found other crash cases :-(
Steps to reproduce :
1. Use the following configuration file :
global
stats socket /tmp/haproxy.sock level admin
server-state-file ./haproxy.state
defaults
load-server-state-from-file global
backend test0
server s0 127.0.0.1:80
2. run haproxy with it.
3. save the server state file :
$ echo "show servers state"|socat stdio /tmp/haproxy.sock > ./haproxy.state
4. stop haproxy and add the keyword "disabled" on the backend "test"0 :
global
stats socket /tmp/haproxy.sock level admin
server-state-file ./haproxy.state
defaults
load-server-state-from-file global
backend test0
disabled
server s0 127.0.0.1:80
5. start haproxy
=> It will die with SIGFPE
This happens because a disabled proxy it not initialized and variables
used to divide weights are set to 0 :
0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0) at
src/server.c:750
750 sv->eweight = (sv->uweight * w + px->lbprm.wmult - 1) /
px->lbprm.wmult;
(gdb) bt
#0 0x000000000043c53a in server_recalc_eweight (sv=sv@entry=0x7b7bd0)
at src/server.c:750
#1 0x000000000043fd97 in srv_update_state (params=0x7fffffffbbd0,
version=1, srv=0x7b7bd0) at src/server.c:2141
#2 apply_server_state () at src/server.c:2473
#3 0x0000000000416b0e in init (argc=<optimized out>, argv=<optimized
out>, argv@entry=0x7fffffffe128) at src/haproxy.c:839
#4 0x0000000000414989 in main (argc=<optimized out>,
argv=0x7fffffffe128) at src/haproxy.c:1635
Main informations :
nbcores 7
server-state-file /tmp/haproxy_server_state
HAProxy -vv output:
HA-Proxy version 1.6.5 2016/05/10
Copyright 2000-2016 Willy Tarreau <[email protected]
<mailto:[email protected]>>
Build options :
TARGET = linux2628
CPU = native
CC = gcc
CFLAGS = -O2 -march=native -g -fno-strict-aliasing
-Wdeclaration-after-statement
OPTIONS = USE_OPENSSL=1 USE_PCRE=1 USE_TFO=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built without compression support (neither USE_ZLIB nor USE_SLZ are set)
Compression algorithms supported : identity("identity")
Built with OpenSSL version : OpenSSL 1.0.2h 3 May 2016
Running on OpenSSL version : OpenSSL 1.0.2h 3 May 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 7.2 2007-06-19
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
As I am using nbcores > 1, the server state file is built like this :
> /tmp/haproxy_server_state
for i in $(/bin/ls /var/run/haproxy-*.sock); do
socat $i - <<< "show servers state" >> /tmp/haproxy_server_state
done
gdb infos :
Program terminated with signal 11, Segmentation fault.
#0 srv_set_running () at include/proto/task.h:244
#1 0x000000000042ee3a in apply_server_state () at src/server.c:2069
#2 0x0000000000405618 in init () at src/haproxy.c:839
#3 0x0000000000407049 in main () at src/haproxy.c:1635
bt full gives this :
#0 srv_set_running () at include/proto/task.h:244
srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
0x8b90e8}, kw = 0x8b12a0}
srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
0x8b12a8}, kw = 0x8b1250}
#1 0x000000000042ee3a in apply_server_state () at src/server.c:2069
srv_keywords = {scope = 0x0, list = {n = 0x8b1258, p =
0x8b90e8}, kw = 0x8b12a0}
srv_kws = {scope = 0x6074f2 "ALL", list = {n = 0x8b4d68, p =
0x8b12a8}, kw = 0x8b1250}
#2 0x0000000000405618 in init () at src/haproxy.c:839
[...]
trash = {
str = 0x23b1910 "Server OBFUSCATED:80/OBFUSCATED is DOWN,
changed from server-state after a reload. 1 active and 0 backup servers
left. 0 sessions active, 0 requeued, 0 remaining in queue", size =
16384, len = 168}
[...]
I can provide for the coredump in private with my haproxy binary if it
helps.
--
Cyril Bonté