On Thu, Feb 16, 2012 at 11:14:37PM -0500, William Seligman wrote: > On 2/16/12 8:13 PM, Andrew Beekhof wrote: > >On Fri, Feb 17, 2012 at 5:05 AM, Dejan Muhamedagic<[email protected]> > >wrote: > >>Hi, > >> > >>On Wed, Feb 15, 2012 at 04:24:15PM -0500, William Seligman wrote: > >>>On 2/10/12 4:53 PM, William Seligman wrote: > >>>>I'm trying to set up an Active/Active cluster (yes, I hear the sounds of > >>>>kittens > >>>>dying). Versions: > >>>> > >>>>Scientific Linux 6.2 > >>>>pacemaker-1.1.6 > >>>>resource-agents-3.9.2 > >>>> > >>>>I'm using cloned IPaddr2 resources: > >>>> > >>>>primitive ClusterIP ocf:heartbeat:IPaddr2 \ > >>>> params ip="129.236.252.13" cidr_netmask="32" \ > >>>> op monitor interval="30s" > >>>>primitive ClusterIPLocal ocf:heartbeat:IPaddr2 \ > >>>> params ip="10.44.7.13" cidr_netmask="32" \ > >>>> op monitor interval="31s" > >>>>primitive ClusterIPSandbox ocf:heartbeat:IPaddr2 \ > >>>> params ip="10.43.7.13" cidr_netmask="32" \ > >>>> op monitor interval="32s" > >>>>group ClusterIPGroup ClusterIP ClusterIPLocal ClusterIPSandbox > >>>>clone ClusterIPClone ClusterIPGroup > >>>> > >>>>When both nodes of my two-node cluster are running, everything looks and > >>>>functions OK. From "service iptables status" on node 1 (hypatia-tb): > >>>> > >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>> > >>>>On node 2 (orestes-tb): > >>>> > >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > >>>>local_node=2 hash_init=0 > >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > >>>>local_node=2 hash_init=0 > >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > >>>>local_node=2 hash_init=0 > >>>> > >>>>If I do a simple test of ssh'ing into 129.236.252.13, I see that I > >>>>alternately > >>>>login into hypatia-tb and orestes-tb. All is good. > >>>> > >>>>Now take orestes-tb offline. The iptables rules on hypatia-tb are > >>>>unchanged: > >>>> > >>>>5 CLUSTERIP all -- 0.0.0.0/0 10.43.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=F1:87:E1:64:60:A5 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>>6 CLUSTERIP all -- 0.0.0.0/0 10.44.7.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=11:8F:23:B9:CA:09 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>>7 CLUSTERIP all -- 0.0.0.0/0 129.236.252.13 > >>>>CLUSTERIP > >>>>hashmode=sourceip-sourceport clustermac=B1:95:5A:B5:16:79 total_nodes=2 > >>>>local_node=1 hash_init=0 > >>>> > >>>>If I attempt to ssh to 129.236.252.13, whether or not I get in seems to be > >>>>machine-dependent. On one machine I get in, from another I get a > >>>>time-out. Both > >>>>machines show the same MAC address for 129.236.252.13: > >>>> > >>>>arp 129.236.252.13 > >>>>Address HWtype HWaddress Flags Mask > >>>> Iface > >>>>hamilton-tb.nevis.colum ether B1:95:5A:B5:16:79 C > >>>> eth0 > >>>> > >>>>Is this the way the cloned IPaddr2 resource is supposed to behave in the > >>>>event > >>>>of a node failure, or have I set things up incorrectly? > >>> > >>>I spent some time looking over the IPaddr2 script. As far as I can tell, > >>>the > >>>script has no mechanism for reconfiguring iptables in the event of a > >>>change of > >>>state in the number of clones. > >>> > >>>I might be stupid -- er -- dedicated enough to make this change on my own, > >>>then > >>>share the code with the appropriate group. The change seems to be > >>>relatively > >>>simple. It would be in the monitor operation. In pseudo-code: > >>> > >>>if (<IPaddr2 resource is already started> ) then > >>> if ( OCF_RESKEY_CRM_meta_clone_max != OCF_RESKEY_CRM_meta_clone_max > >>> last time > >>> || OCF_RESKEY_CRM_meta_clone != OCF_RESKEY_CRM_meta_clone last > >>> time ) > >>> ip_stop > >>> ip_start > >> > >>Just changing the iptables entries should suffice, right? > >>Besides, doing stop/start in the monitor is sort of unexpected. > >>Another option is to add the missing node to one of the nodes > >>which are still running (echo "+<n>">> > >>/proc/net/ipt_CLUSTERIP/<ip>). But any of that would be extremely > >>tricky to implement properly (if not impossible). > >> > >>> fi > >>>fi > >>> > >>>If this would work, then I'd have two questions for the experts: > >>> > >>>- Would the values of OCF_RESKEY_CRM_meta_clone_max and/or > >>>OCF_RESKEY_CRM_meta_clone change if the number of cloned copies of a > >>>resource > >>>changed? > >> > >>OCF_RESKEY_CRM_meta_clone_max definitely not. > >>OCF_RESKEY_CRM_meta_clone may change but also probably not; it's > >>just a clone sequence number. In short, there's no way to figure > >>out the total number of clones by examining the environment. > >>Information such as membership changes doesn't trickle down to > >>the resource instances. > > > >What about notifications? The would be the right point to > >re-configure things I'd have thought. > > I ran a simple test: I added "notify" to the IPaddr2 actions, and > logged the values of every one of the variables in "Pacemaker > Explained" that related to clones. I brought the IPaddr2 up and down > a few times on both my machines. No values changed at all, and no > "notify" actions were logged, though the appropriate "stop", > "start", and "monitor" actions were. It looks like a cloned IPaddr2 > resource doesn't get a notify signal. > > At this point, it looks my notion of re-writing IPaddr2 won't work. > I'm redesigning my cluster configuration so I don't require > cloned/highly-available IP addresses. > > Is this a bug?
Looks like a deficiency. I'm not sure how to deal with it though. > Is there a bugzilla or similar resource for resource agents? https://developerbugs.linuxfoundation.org/enter_bug.cgi?product=Linux-HA Then choose Resource agent. Or create an issue at https://github.com/ClusterLabs/resource-agents Cheers, Dejan > -- > Bill Seligman | mailto://[email protected] > Nevis Labs, Columbia Univ | http://www.nevis.columbia.edu/~seligman/ > PO Box 137 | > Irvington NY 10533 USA | Phone: (914) 591-2823 > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
