Hi All (again :-) )

We are testing with - HAProxy version ss-20120222, released 2012/02/22

We have been testing persistence tables, and maintenance mode with non
stick backup servers - our configuration looks like -


global
        daemon
        stats socket /var/run/haproxy.stat mode 600 level admin
        pidfile /var/run/haproxy.pid
        maxconn 40000
        ulimit-n 81000
defaults
        mode http
        balance roundrobin
        timeout connect 4000
        timeout client 42000
        timeout server 43000
peers loadbalancer_replication
        peer lbmaster localhost:7778
        peer lbslave localhost:7778
listen VIP_Name
        bind 192.168.66.207:80
        mode tcp
        balance leastconn
        server backup 127.0.0.1:9081 backup  non-stick
        stick-table type ip size 10240k expire 30m peers
loadbalancer_replication
        stick on src
        option redispatch
        option abortonclose
        maxconn 40000
        server RIP_Name 192.168.66.50  weight 1  check port 80  inter
2000  rise 2  fall 3 minconn 0  maxconn 0 on-marked-down
shutdown-sessions
        server RIP_Name-1 192.168.66.51:80  weight 1  check   inter
2000  rise 2  fall 3 minconn 0  maxconn 0 on-marked-down
shutdown-sessions
listen stats :7777
        stats enable
        stats uri /
        option httpclose
        stats auth loadbalancer:loadbalancer


We have noticed some strange behaviour when bringing back servers from
maintenance mode where users are getting stuck on the fallback server.
Below is what we normally see -
Start with a clean haproxy (just restarted) with the above configuration -

 In this example we have 2 users both make one connection to the vip
so the persistence table looks like -

echo "show table VIP_Name" | socat unix-connect:/var/run/haproxy.stat stdio
# table: VIP_Name, type: ip, size:10485760, used:2
0x6e2f24: key=192.168.64.4 use=0 exp=1781566 server_id=2
0x6e2fd4: key=192.168.66.199 use=0 exp=1789690 server_id=3

If I set one of the real servers into maintenance mode

echo "disable server VIP_Name/RIP_Name-1 " | socat
unix-connect:/var/run/haproxy.stat stdio

The when the user tries to reconnect they get moved over to the other
real server and their entry in the stick table reflects that -
echo "show table VIP_Name" | socat unix-connect:/var/run/haproxy.stat stdio
# table: VIP_Name, type: ip, size:10485760, used:2
0x6e2f24: key=192.168.64.4 use=0 exp=1790143 server_id=2
0x6e2fd4: key=192.168.66.199 use=0 exp=1793360 server_id=2

however it appears that if both servers are set into maintenance mode
one of the clients appears to get stuck on the backup until the real
server, relating to their persistence entry is brought back online.

To re-produce

connect to the vip with each client  so your persistence table looks
like below -

echo "show table VIP_Name" | socat unix-connect:/var/run/haproxy.stat stdio
# table: VIP_Name, type: ip, size:10485760, used:2
0x6e2f24: key=192.168.64.4 use=0 exp=1788750 server_id=2
0x6e2fd4: key=192.168.66.199 use=0 exp=1792613 server_id=3

Then set both real servers in maintenance mode -

echo "disable server VIP_Name/RIP_Name-1 " | socat
unix-connect:/var/run/haproxy.stat stdio

echo "disable server VIP_Name/RIP_Name " | socat
unix-connect:/var/run/haproxy.stat stdio

connect again and both users see the backup server

Now bring online one of the real servers -
echo "enable server VIP_Name/RIP_Name-1 " | socat
unix-connect:/var/run/haproxy.stat stdio

Now the user that was connected to RIP_Name-1 can now connect to the
real server however the user that was connected to RIP_Name is still
seeing the fallback server.

The persistence table looks like -

 echo "show table VIP_Name" | socat unix-connect:/var/run/haproxy.stat stdio
# table: VIP_Name, type: ip, size:10485760, used:2
0x6e2f24: key=192.168.64.4 use=0 exp=1675945 server_id=2
0x6e2fd4: key=192.168.66.199 use=0 exp=1678714 server_id=3

The only way to get them off the fallback server is to either clear
the persistence table or restart haproxy.

I would expect to see the user to be moved across to the online real
server like it was in the first example, but this appears not to be
the case. Is this by design?

Mark

Reply via email to