Hi there -
We are evaluating different ways of using the “stick match” and “backup”
options in order to achieve certain behaviors for failover, involving a
Galera cluster. For reference, we’re picking and choosing from this blog
post:
http://blog.haproxy.com/2014/01/17/emulating-activepassing-application-clustering-with-haproxy/.
We’re basically looking to have all connections on only one server at all
times; as soon as a server is available, everyone should be kicked off and
blocked from any other servers in the backend. This is because while Galera
is a “multi-master” cluster, we’re trying to have the application only use
one node as “master” at a time, at least for the moment.
it seems like using just the “stick” table alone, as in:
backend db-vms-galera
option httpchk
stick-table type ip size 1
stick on dst
timeout server 90m
server rhos-node1 rhel7-1:3306 check inter 1s port 9200
on-marked-down shutdown-sessions
server rhos-node2 rhel7-2:3306 check inter 1s port 9200
on-marked-down shutdown-sessions
server rhos-node3 rhel7-3:3306 check inter 1s port 9200
on-marked-down shutdown-sessions
is not enough; from my observations running the “show table” socket command
and watching the logs, when a node goes down, its entry still remains in the
stick table for some period of time, and new connections, assuming they
continue to come in fast, have no choice but to skip stick match altogether
(I can provide logs and samples that show this happening).
So we would then gather that the best configuration is this:
backend db-vms-galera
option httpchk
stick-table type ip size 1
stick on dst
timeout server 90m
server rhos-node1 rhel7-1:3306 check inter 1s port 9200
on-marked-down shutdown-sessions
server rhos-node2 rhel7-2:3306 check inter 1s port 9200
on-marked-down shutdown-sessions backup
server rhos-node3 rhel7-3:3306 check inter 1s port 9200
on-marked-down shutdown-sessions backup
this works a lot better; when I kill node1, all the connections go to node2
unambiguously. Then, our galera cluster in an effort to bring node1 back up,
puts node2 into “Read only” mode, which bounces all connections to node3. At
this point node1 comes back and then the stick table does not appear to be
of any use; most connections continue to go to node3 and querying the stick
table shows that it’s set to node3, however a handful of requests also go
back to node1. So again this is not a pure “active/passive” setup, multiple
nodes get hit at the same time. Was wondering if this behavior could be
clarified and how we can use the stick table as an absolute “gate” for all
requests, where no connections will fall through if a server goes down or
comes back up.
I had hopes for the unusual setup of just making all three servers a
“backup” server:
backend db-vms-galera
option httpchk
timeout server 90m
server rhos-node1 rhel7-1:3306 check inter 1s port 9200
on-marked-down shutdown-sessions backup
server rhos-node2 rhel7-2:3306 check inter 1s port 9200
on-marked-down shutdown-sessions backup
server rhos-node3 rhel7-3:3306 check inter 1s port 9200
on-marked-down shutdown-sessions backup
The idea being, no server is “official”, so there’s nothing to “fail back”
towards. This sort of seemed to work but still seems like it wants to “fail
back” up from node 3 to node2, to node 1; this approach seems to do the best
job of making sure all connections are all on one server, though; not
perfectly because it doesn’t bump off connections still talking to the
being-replaced server, but still fairly well.
Any insight on the usage of the stick table here would be appreciated!