On Mon, 16 Feb 2015, Barry Haycock wrote:

I am building a corosync/pacemaker/haproxy HA load balancer in Active/Active mode using ClusterIP. As this built on RHEL 6.5 I am restricted to using PCS to configure the LB.

One of the requirements is to maintain TCP state so that TCP based syslog audit is not lost during a fail over.

I have two questions:

1) is it possible when using conntrackd to maintain TCP state to have a seamless transition to the remaining LB should one of the servers be shutdown. The work group in question cannot afford to loose any messages once the connection has commenced. Some machines will be using a reliable transmission method for syslog such as RELP but others will be using raw TCP.

My testing shows that when sending a large of raw TCP messages via a single connection, the syslog server will loose messages when one of the LBs are shutdown or put into standby. The client machine will start ARPing for the mac address assigned to the VIP till a connection is established with the remaining LB. This can loose us up to 3 seconds worth of messages. In reality I don't expect such a large amount of traffic to be generated via a single connection. But the work group will not accept the solution if we loose any messages.

Will this be a matter of managing the expectations of the work group, that during fail over, messages in transit will be lost when using raw TCP?

Keep in mind that syncing the session state takes time, and so there will always be some window of time that the state exists on one machine and not the other. If you are unlucky enough, the failover will happen in the small timeframe where the connection data is just out of date enough to cause grief. If enough state has been synced to keep the connection from being broken, you will not loose any data. But there is always going to be a window where a new connection is established, and data sent over it, but the backup box doesn't know that the connection exists.

So you do have to set expectations that when things go wrong, there may be a small hiccup. I would try to leave the statement general rather than trying to specify exactly what conditions could cause problems.

The only way to prevent this would be for the conntrack update between machines to happen synchronously (including getting the ack from the updated machine that it has saved the data), and that would cripple your throughput.

Also remember that there are other failure conditions that can cause you to loose messages. If the receiving software restarts, the messages that are in flight will be lost (with plain TCP, everything send but not written to non-volitile media is lost, with RELP things received but not written is lost)

Really, the only way to not loose something is to have an application level acknowlegement that's only sent after the data is safe on redundant non-volitile media.

David Lang
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to