Re: haproxy bug: healthcheck not passing after port change when statefile is enabled

Baptiste Tue, 06 Nov 2018 14:54:01 -0800

Hi,

After debriefing internally, the fix will be much longer and may even
trigger a new server-state file format.
I keep you updated.


Baptiste


On Sun, Nov 4, 2018 at 7:11 PM Baptiste <[email protected]> wrote:

> Hi Sven,
>
> I reviewed the whole thing and I think the support of port in state file
> was added for SRV records, but also for the runtime api, which allows
> changing the port at runtime too.
> I'll come back to you shortly with a fix for this behavior, currently
> discussing with Willy/Fred about it.
> (it's more complicated than moving the code
> """
>  if (port_str)
>                                 srv->svc_port = port;
> """
> a couple of lines above).
>
> Baptiste
>
>
> On Tue, Oct 9, 2018 at 10:52 AM Sven Wiltink <[email protected]> wrote:
>
>> Hey Baptiste,
>>
>>
>> We noticed the SRV patch has been merged. That should mean that we can
>> now fix this issue as well. Would you be able to fix this or should we
>>
>> try to provide a patch?
>>
>>
>> Thanks again in advance,
>>
>> Sven
>> ------------------------------
>> *Van:* Baptiste <[email protected]>
>> *Verzonden:* donderdag 12 juli 2018 14:52:24
>> *Aan:* Sven Wiltink
>> *CC:* [email protected]
>> *Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
>> when statefile is enabled
>>
>> Hi Sven,
>>
>> Thanks for the clarification.
>> It's a bit more complicated than what it is supposed to be.
>> I think we may want to apply the port only if it has been changed at
>> runtime (changed by DNS SRV records).
>>
>> The status is the following: I have a pending patch which brings SRV
>> record information into the state file. (WIP, but last mile)
>> Once it has been merged, we'll be able to fix this issue (by applying the
>> port only when the server is being managed by an SRV record).
>>
>> Baptiste
>>
>>
>> On Tue, Jul 3, 2018 at 3:41 PM, Sven Wiltink <[email protected]> wrote:
>>
>> Hey Baptiste,
>>
>>
>> Thank you for looking into it.
>>
>>
>> The bug is triggered by running haproxy with the following config:
>>
>>
>> global
>>     maxconn 32000
>>     tune.maxrewrite 2048
>>     user haproxy
>>     group haproxy
>>     daemon
>>     chroot /var/lib/haproxy
>>     nbproc 1
>>     maxcompcpuusage 85
>>     spread-checks 0
>>     stats socket /var/run/haproxy.sock mode 600 level admin process 1
>> user haproxy group haproxy
>>     server-state-file test
>>     server-state-base /var/run/haproxy/state
>>     master-worker no-exit-on-failure
>>
>> defaults
>>     load-server-state-from-file global
>>     log global
>>     timeout http-request 5s
>>     timeout connect      2s
>>     timeout client       300s
>>     timeout server       300s
>>     mode http
>>     option dontlog-normal
>>     option http-server-close
>>     option redispatch
>>     option log-health-checks
>>
>> listen stats
>>     bind :1936
>>     bind-process 1
>>     mode http
>>     stats enable
>>     stats uri /
>>     stats admin if TRUE
>>
>> listen banaan-443-ipv4
>>     bind :443
>>     mode tcp
>>     server banaan-vps 127.0.0.1:443 check inter 2000
>>
>>
>> - Then start haproxy (it will do healthchecks to port 443)
>> - change server banaan-vps 127.0.0.1:443 check inter 2000 to server
>> banaan-vps 127.0.0.1:80 check inter 2000
>> - save the state using /bin/sh -c "echo show servers state |
>> /usr/bin/socat /var/run/haproxy.sock - > /var/run/haproxy/state/test"
>> (this is normally done using the systemd file on reload, see initial mail)
>> - reload haproxy (it still does healthchecks to port 443 while port 80
>> was expected)
>>
>> if you delete the statefile and reload haproxy it will start healthchecks
>> for port 80 as expected
>>
>> -Sven
>>
>>
>>
>>
>>
>>
>> ------------------------------
>> *Van:* Baptiste <[email protected]>
>> *Verzonden:* dinsdag 3 juli 2018 11:38:14
>> *Aan:* Sven Wiltink
>> *CC:* [email protected]
>> *Onderwerp:* Re: haproxy bug: healthcheck not passing after port change
>> when statefile is enabled
>>
>> Hi Sven,
>>
>> Thanks a lot for your feedback!
>> I'll check how we could handle this use case with the state file.
>>
>> Just to ensure I'm going to troubleshoot the right issue, could you
>> please summarize how you trigger this issue in a few simple steps?
>> IE:
>> - conf v1, server port is X
>> - generate server state (where port is X)
>> - update conf to v2, where port is Y
>> reload HAProxy => X is applied, while you expect to get Y instead
>>
>> Baptiste
>>
>>
>>
>> On Mon, Jun 25, 2018 at 12:55 PM, Sven Wiltink <[email protected]>
>> wrote:
>>
>> Hello,
>>
>>
>> So we've dug a little deeper and the issue seems to be caused by the port
>> value in the statefile. When the target port of a server has changed
>> between reloads the port specified in the state file is leading. When
>> running tcpdump you can see the healthchecks are being performed for the
>> old port. After stopping haproxy and removing the statefile the healthcheck
>> is performed for the right port. When manually editing the statefile to a
>> random port the healthchecks will be performed for that port instead of the
>> one specified by the config.
>>
>>
>> The code responsible for this is line
>> http://git.haproxy.org/?p=haproxy-1.8.git;a=blob;f=src/server.c;h=523289e3bda7ca6aa15575f1928f5298760cf582;hb=HEAD#l2931
>>
>> from commit
>> http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=3169471964fdc49963e63f68c1fd88686821a0c4
>> .
>>
>>
>> A solution would be invalidating the state when the ports don't match.
>>
>>
>> -Sven
>>
>>
>>
>> ------------------------------
>> *Van:* Sven Wiltink
>> *Verzonden:* dinsdag 12 juni 2018 17:01:18
>> *Aan:* [email protected]
>> *Onderwerp:* haproxy bug: healthcheck not passing after port change when
>> statefile is enabled
>>
>> Hello,
>>
>> There seems to be a bug in the loading of state files after a
>> configuration change. When changing the destination port of a server the
>> healthchecks never start passing if the state before the reload was down.
>> This bug has been introduced after 1.7.9 as we cannot reproduce it on
>> machines running that version of haproxy. You can use the following steps
>> to reproduce the issue:
>>
>> Start with a fresh debian 9 install
>> install socat
>> install haproxy 1.8.9 from backports
>>
>> create a systemd file 
>> /etc/systemd/system/haproxy.service.d/60-haproxy-server_state.conf
>> with the following contents:
>> [Service]
>> ExecStartPre=/bin/mkdir -p /var/run/haproxy/state
>> ExecReload=
>> ExecReload=/usr/sbin/haproxy -f ${CONFIG} -c -q $EXTRAOPTS
>> ExecReload=/bin/sh -c "echo show servers state | /usr/bin/socat
>> /var/run/haproxy.sock - > /var/run/haproxy/state/test"
>> ExecReload=/bin/kill -USR2 $MAINPID
>>
>> create the following files:
>> /etc/haproxy/haproxy.cfg.disabled:
>> global
>>     maxconn 32000
>>     tune.maxrewrite 2048
>>     user haproxy
>>     group haproxy
>>     daemon
>>     chroot /var/lib/haproxy
>>     nbproc 1
>>     maxcompcpuusage 85
>>     spread-checks 0
>>     stats socket /var/run/haproxy.sock mode 600 level admin process 1
>> user haproxy group haproxy
>>     server-state-file test
>>     server-state-base /var/run/haproxy/state
>>     master-worker no-exit-on-failure
>>
>> defaults
>>     load-server-state-from-file global
>>     log global
>>     timeout http-request 5s
>>     timeout connect      2s
>>     timeout client       300s
>>     timeout server       300s
>>     mode http
>>     option dontlog-normal
>>     option http-server-close
>>     option redispatch
>>     option log-health-checks
>>
>> listen stats
>>     bind :1936
>>     bind-process 1
>>     mode http
>>     stats enable
>>     stats uri /
>>     stats admin if TRUE
>>
>> /etc/haproxy/haproxy.cfg.different-port:
>> global
>>     maxconn 32000
>>     tune.maxrewrite 2048
>>     user haproxy
>>     group haproxy
>>     daemon
>>     chroot /var/lib/haproxy
>>     nbproc 1
>>     maxcompcpuusage 85
>>     spread-checks 0
>>     stats socket /var/run/haproxy.sock mode 600 level admin process 1
>> user haproxy group haproxy
>>     server-state-file test
>>     server-state-base /var/run/haproxy/state
>>     master-worker no-exit-on-failure
>>
>> defaults
>>     load-server-state-from-file global
>>     log global
>>     timeout http-request 5s
>>     timeout connect      2s
>>     timeout client       300s
>>     timeout server       300s
>>     mode http
>>     option dontlog-normal
>>     option http-server-close
>>     option redispatch
>>     option log-health-checks
>>
>> listen stats
>>     bind :1936
>>     bind-process 1
>>     mode http
>>     stats enable
>>     stats uri /
>>     stats admin if TRUE
>>
>> listen banaan-443-ipv4
>>     bind :443
>>     mode tcp
>>     server banaan-vps 127.0.0.1:80 check inter 2000
>> listen banaan-80-ipv4
>>     bind :80
>>     mode tcp
>>     server banaan-vps 127.0.0.1:80 check inter 2000
>>
>> /etc/haproxy/haproxy.cfg.same-port:
>> global
>>     maxconn 32000
>>     tune.maxrewrite 2048
>>     user haproxy
>>     group haproxy
>>     daemon
>>     chroot /var/lib/haproxy
>>     nbproc 1
>>     maxcompcpuusage 85
>>     spread-checks 0
>>     stats socket /var/run/haproxy.sock mode 600 level admin process 1
>> user haproxy group haproxy
>>     server-state-file test
>>     server-state-base /var/run/haproxy/state
>>     master-worker no-exit-on-failure
>>
>> defaults
>>     load-server-state-from-file global
>>     log global
>>     timeout http-request 5s
>>     timeout connect      2s
>>     timeout client       300s
>>     timeout server       300s
>>     mode http
>>     option dontlog-normal
>>     option http-server-close
>>     option redispatch
>>     option log-health-checks
>>
>> listen stats
>>     bind :1936
>>     bind-process 1
>>     mode http
>>     stats enable
>>     stats uri /
>>     stats admin if TRUE
>>
>> listen banaan-443-ipv4
>>     bind :443
>>     mode tcp
>>     server banaan-vps 127.0.0.1:443 check inter 2000
>> listen banaan-80-ipv4
>>     bind :80
>>     mode tcp
>>     server banaan-vps 127.0.0.1:80 check inter 2000
>>
>>
>> start a netcat process to fake a webserver: nc -klp 80
>> cp haproxy.cfg.disabled to haproxy.cfg and start haproxy.
>> cp haproxy.cfg.same-port to haproxy.cfg and reload haproxy. You will now
>> see that the servers for banaan-443-ipv4 are marked as down, as expected
>> (nothing is running on port 443).
>> Now cp haproxy.cfg.different-port to haproxy.cfg and reload haproxy
>> again. banaan-443-ipv4 will still be marked as down, although it uses the
>> same healthcheck as the port 80 configuration: server banaan-vps
>> 127.0.0.1:80 check inter 2000
>>
>> If we now stop haproxy and delete the statefile located at
>> /var/run/haproxy/state/test and start haproxy again the server will be
>> marked as up.
>>
>> Thanks in advance,
>> Sven
>>
>>
>>
>>
>>

Re: haproxy bug: healthcheck not passing after port change when statefile is enabled

Reply via email to