On Mon, Oct 13, 2008 at 02:04:41PM +0200, Michael Schwartzkopff wrote:
> Am Montag, 13. Oktober 2008 13:38 schrieb Florian Haas:
> > Hello,
> >
> > inspired by a discussion with the SerNet guys at Linux Kongress last
> > week, here's a thought I'd like to poll comments on.
> >
> (...)
> > Now I wonder if one could add functionality to the IPaddr2 RA to achieve
> > in essence the same thing. Suppose that triggered by an optional
> > resource parameter, IPaddr2 would invoke a mechanism similar to the one
> > employed by cutter (http://www.lowth.com/cutter/) after IP address
> > takeover. In conjunction with a TCP connection state replication utility
> > such as conntrackd (http://conntrack-tools.netfilter.org/), this should
> > enable the RA to actively cut off TCP connections to that IP address,
> > forcing a client reconnect.
> (...)
> 
> Nice idea. What happens if conntrackd is activated on a cluster and a 
> failover 
> occures? Shouldn't the application send a RST on its own? Anybody tried this?

forwarded connections will survive transparently.



when the failed cluster node was _endpoint_ of tcp sessions,
the client will still receive a RST if it was currently actively
communicating with the server.

however, if the client side was in some tcp state waiting for response
from the server, this response now will never come,
and the client will run into some timeout.

once that timeout expires, the client would send some tcp keepalive or
other retry packet, and that is the time where the RST would be sent
normally. however that may be too long for certain applications.

to make the client notice and reestablish the connection asap, the
tickle ack provokes a reaction on the client side,
avoiding these timeouts.

for samba and windows clients, the difference between using and not
using these provocative tickle acks on failover is
complete transparent failover,
or a client side IO error and a popup box about "server does not
respond", and possibly a disconnected share. 

for more "robust" implementations, still for many cases
the failover within the cluster is much faster
than it takes the typical client waiting for server response
to recognize that it needs to re-establish the tcp session.

the basic point is, even if you are able to implement few to sub-second
failover on the server side, clients may still need minutes to notice.

tickle acks provoking client side action on the sessions,
leading to immediate RST and re-establish may reduce the time
it takes the client to recognize the failover.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to