Backup server takes too long to go active

Shawn Heisey Tue, 24 Apr 2018 14:08:49 -0700

I sent this query to the list previously nine days ago.  I got no
response.  Trying again.


======

Kernel info:

root@lb1:/etc/haproxy# uname -a
Linux lb1 3.13.0-52-generic #86-Ubuntu SMP Mon May 4 04:32:59 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux


Here's the haproxy version output:

root@lb1:/etc/haproxy# haproxy -vv
HA-Proxy version 1.5.12 2015/05/02
Copyright 2000-2015 Willy Tarreau <w...@1wt.eu>

Build options :
  TARGET  = linux2628
  CPU     = native
  CC      = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE= USE_PCRE_JIT=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
Running on OpenSSL version : OpenSSL 1.0.2j  26 Sep 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.



The configuration I had is with a backend that has two servers, one of
them tagged as backup. This is the actual config that I had active when
I saw the problem:

backend be-cdn-9000
        description Back end for the thumbs CDN
        cookie MSDSRVHA insert indirect nocache
        server planet 10.100.2.123:9000 weight 100 cookie planet track
chk-cdn-9000/planet
        server hollywood 10.100.2.124:9000 weight 100 backup cookie
hollywood track chk-cdn-9000/hollywood


My check interval is ten seconds and the check timeout is 9990 milliseconds.

==============

If I shut down the primary server, eventually haproxy notices this.  No
problem so far.

The problem is that about ten seconds pass between the time haproxy
notices the primary going down and the time that the backup server
changes to active.  During that time, clients get "no server available"
errors.

It is my expectation that as soon as the primary server goes down,
haproxy will promote the backup server(s) with the highest weight to
active and send requests there.

I know that I'm running an out of date version and can't expect any
changes in 1.5.x.  I have not yet tried this with a newer version of
haproxy.

As a workaround, I have removed the "backup" keyword.  For this backend
that's an acceptable solution, but I am using that configuration
elsewhere in cases where I only want the traffic to go to one server as
long as that server is up, and I need the backup server to take over
quickly when the primary fails.

Thanks,
Shawn

Backup server takes too long to go active

Reply via email to