Re: Backup server takes too long to go active

Shawn Heisey Tue, 24 Apr 2018 18:57:14 -0700

On 4/24/2018 3:38 PM, Cyril Bonté wrote:

Le 24/04/2018 à 23:07, Shawn Heisey a écrit :

The configuration I had is with a backend that has two servers, one of
them tagged as backup. This is the actual config that I had active when
I saw the problem:


backend be-cdn-9000
         description Back end for the thumbs CDN
         cookie MSDSRVHA insert indirect nocache
         server planet 10.100.2.123:9000 weight 100 cookie planet track
chk-cdn-9000/planet
         server hollywood 10.100.2.124:9000 weight 100 backup cookie
hollywood track chk-cdn-9000/hollywood

Well, you don't provide any information about the tracked serverschk-cdn-9000/planet and chk-cdn-9000/hollywood.

This is the tracking backend at the time. In the current config, thisbackend no longer exists. I couldn't get the disable-on-404 setting towork with a tracking back end, so the real backend does the healthchecks now.


backend chk-cdn-9000
  description A healthcheck backend for the thumbnail CDN.
  option httpchk GET /healthcheck

server planet 10.100.2.123:9000 check inter 10s fastinter 3s rise 3fall 2 server hollywood 10.100.2.124:9000 check inter 10s fastinter 3s rise3 fall 2

Without any information about the 2 tracked server, I'd say thebehaviour is expected. A backup server is promoted only if it is UPitself. What is the state of chk-cdn-9000/hollywood during that time ?It looks like it's not UP yet.

Before beginning, both servers were up. The one named planet wasactive, the one named hollywood was backup. I was watching the statuspage closely the whole time.

I updated the software on hollywood and stopped the service on thatsystem. After waiting long enough for haproxy to notice the servergoing down, I started it back up. After a short time, it went to the upstate (still backup). So at this point the state is identical to thestarting state.

Then I updated and stopped planet. Understandably, haproxy noticed thatplanet went down. But instead of immediately promoting hollywood to anactive state as soon as planet was marked down, it waited an additionaltime period (which I think was about ten seconds, but I did notprecisely time), and during that time period, a curl client trying toconnect to the load balanced URL was receiving "no server available"messages. Once hollywood was promoted to active, everything was good.

Because of the delay in promoting the backup server, I removed thebackup keyword from the back end, and requests are now load balancedequally between both servers (in the absence of a cookie). But I dohave another haproxy setup where that is not an acceptable solution.

I'm hoping to figure out how to make a backup server transitionimmediately to active as soon as the primary server is marked down. Ifyou need additional info, please let me know.


Thanks,
Shawn

Re: Backup server takes too long to go active

Reply via email to