Hi linux-ha list members,
I've hit a problem using an IPaddr2 resource with Linux HA 2.0.2. I'm using a 1.x style two node cluster with a single shared IPaddr2 resource. The nodes are configured with unicast keepalives to each other. Backup serial keepalives are not possible in this configuration, so split brain occurs when network connectivity is disrupted. Our application has synchronization mechanisms to deal with split brain recovery. Here's the series of events that makes the IP address managed by the IPaddr2 resource inaccessible: 1. Server 1 (active) and server 2 (standby) are paired. Server 1 owns the IPaddr2 resource and the IP address is accessible from beyond the router. 2. Pull the cables on server 1. Server 2 notices server 1 appears dead. Server 2 goes active and sends gratuitous ARPs. The IPaddr2 resource is accessible and points to server 2. Server 1 stays active but notices that server 2 appears dead from its standpoint. 3. Pull the cables on server 2. Server 2 stays active, server 1 stays active. Both think the other server is dead (split brain). The IPaddr2 resource is not accessible. 4. Plug in server 1. Server 1 is connected and still thinks he is active and server 2 is dead. From each server's standpoint, nothing has changed. Each thinks the other is dead. Since server 1 never transitioned from standby to active, no gratuitous ARPs were sent and the router's ARP cache still points to server 2's MAC address. The IPaddr2 resource is inaccessible until the 4 hour default Cisco ARP cache timeout on our router. Is there another way to do this with Linux HA so that a shared IP address will be handled correctly in this scenario? We really need for gratuitous ARPs to be sent when server 1 is plugged in again in step 4. Without gratuitous ARPs when the server is reconnected, our router's (Cisco 3560) ARP cache is not updated until its default 4 hour aging time expires. I tried using ping nodes to our router, thinking that when network connectivity was reestablished it would resend the gratuitous ARPs, but that appears not to be the case. Thanks in advance for any help and suggestions. Regards, Eric Blau _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
