On 2/10/12 4:53 PM, William Seligman wrote: > I'm trying to set up an Active/Active cluster (yes, I hear the sounds of > kittens > dying). Versions: > > Scientific Linux 6.2 > pacemaker-1.1.6 > resource-agents-3.9.2 > > I'm using cloned IPaddr2 resources: > > primitive ClusterIP ocf:heartbeat:IPaddr2 \ > params ip="129.236.252.13" cidr_netmask="32" \ > op monitor interval="30s" > primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \ > params ip="10.44.7.13" cidr_netmask="32" \ > op monitor interval="31s" > primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \ > params ip="10.43.7.13" cidr_netmask="32" \ > op monitor interval="32s" > group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox > clone ClusterIPClone ClusterIPGroup > > When both nodes of my two-node cluster are running, everything looks and > functions OK. From "service iptables status" on node 1 (hypatia-tb): > > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > local_node=1 hash_init=0 > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > local_node=1 hash_init=0 > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > local_node=1 hash_init=0 > > On node 2 (orestes-tb): > > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > local_node=2 hash_init=0 > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > local_node=2 hash_init=0 > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > local_node=2 hash_init=0 > > If I do a simple test of ssh'ing into 129.236.252.13, I see that I alternately > login into hypatia-tb and orestes-tb. All is good. > > Now take orestes-tb offline. The iptables rules on hypatia-tb are unchanged: > > 5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > local_node=1 hash_init=0 > 6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > local_node=1 hash_init=0 > 7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 CLUSTERIP > hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > local_node=1 hash_init=0 > > If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be > machine-dependent. On one machine I get in, from another I get a time-out. > Both > machines show the same MAC address for 129.236.252.13: > > arp 129.236.252.13 > Address HWtype HWaddress Flags Mask > Iface > hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C > eth0 > > Is this the way the cloned IPaddr2 resource is supposed to behave in the event > of a node failure, or have I set things up incorrectly?
I spent some time looking over the IPaddr2 script. As far as I can tell, the
script has no mechanism for reconfiguring iptables in the event of a change of
state in the number of clones.
I might be stupid -- er -- dedicated enough to make this change on my own, then
share the code with the appropriate group. The change seems to be relatively
simple. It would be in the monitor operation. In pseudo-code:
if ( <IPaddr2 resource is already started> ) then
if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max last time
|| OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last time )
ip_stop
ip_start
fi
fi
If this would work, then I'd have two questions for the experts:
- Would the values of OCF_RESKEY_CRM_meta_clone_max and/or
OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a resource
changed?
- Is there some standard mechanism by which RA scripts can maintain persistent
information between successive calls?
I realize there's a flaw in the logic: it risks breaking an ongoing IP
connection. But as it stands, IPaddr2 is a clonable resource but not a
highly-available one. If one of N cloned copies goes down, then one out of N new
network connections to the IP address will fail.
--
Bill Seligman | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://[email protected]
PO Box 137 |
Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
