On Mon, Oct 13, 2008 at 02:04:41PM +0200, Michael Schwartzkopff wrote: > Am Montag, 13. Oktober 2008 13:38 schrieb Florian Haas: > > Hello, > > > > inspired by a discussion with the SerNet guys at Linux Kongress last > > week, here's a thought I'd like to poll comments on. > > > (...) > > Now I wonder if one could add functionality to the IPaddr2 RA to achieve > > in essence the same thing. Suppose that triggered by an optional > > resource parameter, IPaddr2 would invoke a mechanism similar to the one > > employed by cutter (http://www.lowth.com/cutter/) after IP address > > takeover. In conjunction with a TCP connection state replication utility > > such as conntrackd (http://conntrack-tools.netfilter.org/), this should > > enable the RA to actively cut off TCP connections to that IP address, > > forcing a client reconnect. > (...) > > Nice idea. What happens if conntrackd is activated on a cluster and a > failover > occures? Shouldn't the application send a RST on its own? Anybody tried this?
forwarded connections will survive transparently. when the failed cluster node was _endpoint_ of tcp sessions, the client will still receive a RST if it was currently actively communicating with the server. however, if the client side was in some tcp state waiting for response from the server, this response now will never come, and the client will run into some timeout. once that timeout expires, the client would send some tcp keepalive or other retry packet, and that is the time where the RST would be sent normally. however that may be too long for certain applications. to make the client notice and reestablish the connection asap, the tickle ack provokes a reaction on the client side, avoiding these timeouts. for samba and windows clients, the difference between using and not using these provocative tickle acks on failover is complete transparent failover, or a client side IO error and a popup box about "server does not respond", and possibly a disconnected share. for more "robust" implementations, still for many cases the failover within the cluster is much faster than it takes the typical client waiting for server response to recognize that it needs to re-establish the tcp session. the basic point is, even if you are able to implement few to sub-second failover on the server side, clients may still need minutes to notice. tickle acks provoking client side action on the sessions, leading to immediate RST and re-establish may reduce the time it takes the client to recognize the failover. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
