On 4/24/2018 3:38 PM, Cyril Bonté wrote:
Le 24/04/2018 à 23:07, Shawn Heisey a écrit :
The configuration I had is with a backend that has two servers, one of
them tagged as backup. This is the actual config that I had active when
I saw the problem:
backend be-cdn-9000
description Back end for the thumbs CDN
cookie MSDSRVHA insert indirect nocache
server planet 10.100.2.123:9000 weight 100 cookie planet track
chk-cdn-9000/planet
server hollywood 10.100.2.124:9000 weight 100 backup cookie
hollywood track chk-cdn-9000/hollywood
Well, you don't provide any information about the tracked servers
chk-cdn-9000/planet and chk-cdn-9000/hollywood.
This is the tracking backend at the time. In the current config, this
backend no longer exists. I couldn't get the disable-on-404 setting to
work with a tracking back end, so the real backend does the health
checks now.
backend chk-cdn-9000
description A healthcheck backend for the thumbnail CDN.
option httpchk GET /healthcheck
server planet 10.100.2.123:9000 check inter 10s fastinter 3s rise 3
fall 2
server hollywood 10.100.2.124:9000 check inter 10s fastinter 3s rise
3 fall 2
Without any information about the 2 tracked server, I'd say the
behaviour is expected. A backup server is promoted only if it is UP
itself. What is the state of chk-cdn-9000/hollywood during that time ?
It looks like it's not UP yet.
Before beginning, both servers were up. The one named planet was
active, the one named hollywood was backup. I was watching the status
page closely the whole time.
I updated the software on hollywood and stopped the service on that
system. After waiting long enough for haproxy to notice the server
going down, I started it back up. After a short time, it went to the up
state (still backup). So at this point the state is identical to the
starting state.
Then I updated and stopped planet. Understandably, haproxy noticed that
planet went down. But instead of immediately promoting hollywood to an
active state as soon as planet was marked down, it waited an additional
time period (which I think was about ten seconds, but I did not
precisely time), and during that time period, a curl client trying to
connect to the load balanced URL was receiving "no server available"
messages. Once hollywood was promoted to active, everything was good.
Because of the delay in promoting the backup server, I removed the
backup keyword from the back end, and requests are now load balanced
equally between both servers (in the absence of a cookie). But I do
have another haproxy setup where that is not an acceptable solution.
I'm hoping to figure out how to make a backup server transition
immediately to active as soon as the primary server is marked down. If
you need additional info, please let me know.
Thanks,
Shawn