On 3/28/15 12:29 PM, Michael Bayer wrote:
Hi Baptiste -
I had tried the “nopurge” option but not in conjunction with the two
“backup” servers. So yes, when we have “nopurge” on in conjunction with two
backup servers, even when server 1 fails, it stays permanently in the stick
table, so that as soon as it’s back up the requests seem to go back to node
1 more aggressively. I’ll experiment more with this setting - thanks!
I would like to understand this better though, it seems like a gray
area in HAProxy as to how the stick table interacts with
non-backup servers that are down, and backup servers that are active. When
“nopurge” is not set, the proxy gets into the state where node3, a backup
node, is the one logged in the stick table, requests go there. But then as
node 1 comes back up, HAProxy seems to route connections to either node 1
(the non backup server that’s up) or node 3 (the backup server that is
nevertheless matching in the “stick” table) somewhat randomly, more
specifically when it needs to handle two near-simultaneous connection
requests.
Running "show table db-vms-galera” in a loop shows node3 is
persistently in the table:
# table: db-vms-galera, type: ip, size:1, used:1
0x7fb5a6215e84: key=192.168.1.200 use=0 exp=0 server_id=3
# table: db-vms-galera, type: ip, size:1, used:1
0x7fb5a6215e84: key=192.168.1.200 use=0 exp=0 server_id=3
# table: db-vms-galera, type: ip, size:1, used:1
0x7fb5a6215e84: key=192.168.1.200 use=0 exp=0 server_id=3
… continues like this ...
and the logs make it clear connections are going to either node - note that
in particular, it seems to occur when two requests come in at “exactly" the
same time (see 12:14:11.258 on 41795, 41796, 12:14:11.260 on 41797, 41798),
which looks a lot like a race condition:
Mar 28 12:14:16 localhost haproxy[30229]: 192.168.1.118:41795
[28/Mar/2015:12:14:11.258] vip-db db-vms-galera/rhos-node3 1/2/5250 182230 --
5/5/5/3/0 0/0
Mar 28 12:14:16 localhost haproxy[30229]: 192.168.1.118:41799
[28/Mar/2015:12:14:11.261] vip-db db-vms-galera/rhos-node3 1/0/5346 142414 --
4/4/4/2/0 0/0
Mar 28 12:14:17 localhost haproxy[30229]: 192.168.1.118:41797
[28/Mar/2015:12:14:11.260] vip-db db-vms-galera/rhos-node3 1/0/5861 142312 --
4/4/4/2/0 0/0
Mar 28 12:14:17 localhost haproxy[30229]: 192.168.1.118:41798
[28/Mar/2015:12:14:11.260] vip-db db-vms-galera/rhos-node1 1/0/6053 142301 --
4/4/4/1/0 0/0
Mar 28 12:14:17 localhost haproxy[30229]: 192.168.1.118:41796
[28/Mar/2015:12:14:11.258] vip-db db-vms-galera/rhos-node1 1/2/6317 142180 --
5/5/5/0/0 0/0
Mar 28 12:14:21 localhost haproxy[30229]: 192.168.1.118:41800
[28/Mar/2015:12:14:16.508] vip-db db-vms-galera/rhos-node3 1/0/5166 119038 --
5/5/5/5/0 0/0
Mar 28 12:14:21 localhost haproxy[30229]: 192.168.1.118:41801
[28/Mar/2015:12:14:16.608] vip-db db-vms-galera/rhos-node3 1/0/5270 158273 --
5/5/5/5/0 0/0
Mar 28 12:14:22 localhost haproxy[30229]: 192.168.1.118:41802
[28/Mar/2015:12:14:17.120] vip-db db-vms-galera/rhos-node3 1/0/5187 158359 --
5/5/4/4/0 0/0
Mar 28 12:14:23 localhost haproxy[30229]: 192.168.1.118:41804
[28/Mar/2015:12:14:17.562] vip-db db-vms-galera/rhos-node3 1/0/5694 158702 CD
4/4/4/4/0 0/0
I’m using only a single-process HAProxy (no nbproc setting), so the
appearance of a race is unusual here as I know HAProxy uses an event-driven
model for internal concurrency.
Can the rules and behaviors of HAProxy in this area be clarified?
hey all -
So I continue to observe that the "stick table" setting with "stick on
dst" does not seem to take effect in all cases; either when receiving
multiple connections very concurrently (unexpected), or when the target
server fails (expected, but the value does not seem to get replaced
atomically with the new server, so still unexpected), and while I
continue to feel like I'm doing something wrong, misunderstanding
something, or if there is just something wrong in my environment that is
not reproducible elsewhere, I've posted the environment and the script
I'm using to get these results as a bug at Red Hat. If anyone is
curious to review what I'm doing, it is over at
https://bugzilla.redhat.com/show_bug.cgi?id=1211781 (if there is a
public bug tracker for HAProxy itself, I can post there as well).