Hi, On Mon, Feb 16, 2015 at 06:07:26AM -0800, David Lang wrote: > On Mon, 16 Feb 2015, Barry Haycock wrote: > > >I am building a corosync/pacemaker/haproxy HA load balancer in > >Active/Active mode using ClusterIP. As this built on RHEL 6.5 I am > >restricted to using PCS to configure the LB. > > > >One of the requirements is to maintain TCP state so that TCP based > >syslog audit is not lost during a fail over. > > > >I have two questions: > > > >1) is it possible when using conntrackd to maintain TCP state to > >have a seamless transition to the remaining LB should one of the > >servers be shutdown. The work group in question cannot afford to > >loose any messages once the connection has commenced. Some > >machines will be using a reliable transmission method for syslog > >such as RELP but others will be using raw TCP. > > > >My testing shows that when sending a large of raw TCP messages via > >a single connection, the syslog server will loose messages when > >one of the LBs are shutdown or put into standby. The client > >machine will start ARPing for the mac address assigned to the VIP > >till a connection is established with the remaining LB. This can > >loose us up to 3 seconds worth of messages. In reality I don't > >expect such a large amount of traffic to be generated via a single > >connection. But the work group will not accept the solution if we > >loose any messages. > > > >Will this be a matter of managing the expectations of the work > >group, that during fail over, messages in transit will be lost > >when using raw TCP? > > Keep in mind that syncing the session state takes time, and so there > will always be some window of time that the state exists on one > machine and not the other. If you are unlucky enough, the failover > will happen in the small timeframe where the connection data is just > out of date enough to cause grief. If enough state has been synced > to keep the connection from being broken, you will not loose any > data. But there is always going to be a window where a new > connection is established, and data sent over it, but the backup box > doesn't know that the connection exists. > > So you do have to set expectations that when things go wrong, there > may be a small hiccup. I would try to leave the statement general > rather than trying to specify exactly what conditions could cause > problems. > > The only way to prevent this would be for the conntrack update > between machines to happen synchronously (including getting the ack > from the updated machine that it has saved the data), and that would > cripple your throughput. > > Also remember that there are other failure conditions that can cause > you to loose messages. If the receiving software restarts, the > messages that are in flight will be lost (with plain TCP, everything > send but not written to non-volitile media is lost, with RELP things > received but not written is lost) > > Really, the only way to not loose something is to have an > application level acknowlegement that's only sent after the data is > safe on redundant non-volitile media.
Not an easy to solve problem. Don't have any experience with it personally, but wasn't ocf:heartbeat:portblock supposed to help a bit in such cases? Thanks, Dejan > David Lang > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems