[ha-clusters-discuss] TCPIP connection fail-over

Ashutosh Tripathi Wed, 25 Feb 2009 15:21:08 -0800

Hi Karel,

        It depends upon whether your client (telnet in case), is connecting to
a so-called "Scalable Service" (the one which runs on all nodes at the
same time and uses a SharedAddress as the HA ipaddress) or a failover service.


        For failover services, you would get a "connection reset by peer" after
the failover is complete.

        For scalable services, the situation is trickier. It depends upon
exactly what node you were connected to. Even though the HA ipaddress (so called
SharedAddress) might be on node1, your connection might have been routed to
node2. In that case, after node1 crashes and the ipaddress fails over to node2,
your connection will continue. If your connection was routed to node1, there is
no way to get it back. I believe you might get a connection reset or simply hang
depending the exact policy (Sticky/round-robin etc.) and the number of ports
you had in the service, but bottom line is that your connection wouldn't
continue.

        Some "fault tolerant" implementations actually checkpoint TCP-IP state
to remote nodes even for failover services in order to survive failures. But in
order for this to really work, one needs to checkpoint the end-service (telnetd
daemon and the shell it launched in response to login, in your case) state as
well to another node so that the  TCP-IP state and the end application state are
  in-sync.  That type of  checkpointing quickly gets messy because unless you
modify the  application  itself, its "state" is a pretty rapidly changing thing
and this is not a practical way to get HA, you are really getting into the
domain of fault-tolerant computing. SC does not provide this type of
infrastructure by itself.

Probably more info then you asked for.. you were just looking for
a YES/NO, right? :-)

HTH,
-ashu





Karel Gardas wrote:
> Hello,
> I'm planning to play a bit with Sun Cluster, but before it I'm most curious 
> to know if it supports transparent TCPIP connection fail-over. Let say, what 
> would happen to client's telnet to cluster where node1 is currently master, 
> when the node1 is crashed down and taken over by node2? Will the telnet 
> finish with `connection closed by remote peer' (or something like that) or 
> will it survive?
> Thanks,
> Karel
> PS: telnet is just an example.

[ha-clusters-discuss] TCPIP connection fail-over

Reply via email to