On Oct 25, 2007, at 11:46 PM, [EMAIL PROTECTED] wrote:
Hello all,
Like many others I've read and re-read the webpage and searched the
mailing list for the past week and a half, and I'm still not getting
where
I want to be.
I'm working with a two-node cluster whose configuration details are
below.
To produce the logs, I performed the following:
1. started heartbeat on both nodes
2. started all resources
3. unplugged 100 network on node1
4. waited exactly 5 minutes
5. plugged 100 network back into node1
First, all of my resources are in the "EnterpriseSprayer" group, and
are
ordered and collocated. The startup order is always correct and and
the
resources are always started on the same node. Using the gui, I can
manually standby the 1st node and all of the resources get
transitioned
perfectly. The problem is that if I simulate a network failure on
the 100
subnet (by unplugging the cable), the resources never transition.
The logs indicate that they were stopped on plspgen01 though.
The reason, I suspect, that they weren't started on plspgen02 is
because of this:
Oct 25 15:41:20 plspgen02 pengine: [14442]: WARN: unpack_rsc_op:
Processing failed op (pound_process_start_0) on plspgen02
Other than that, as far as I can tell, it would have worked.
My desired behavior is:
1. Only start the resources on a node where the gateway is reachable.
2. Keep monitoring the gateway and transition the resources if the
gateway
becomes unreachable.
3. If any of the resources go down, restart them as necessary.
4. I don't care where the resources run, as long as they are
running. (I
don't need them to stick to one node or the other)
Thanks,
Justin
Network details:
100 subnet is a DMZ vlan
66 is a heartbeat vlan
machine1:
sles10 sp1 x86_64
eth0 - 10.1.100.177
eth1 - 10.1.66.177
machine2:
sles10 sp1 x86_64
eth0 - 10.1.100.178
eth1 - 10.1.66.178
heartbeat 2.1.2
resources:
3 virtual aliases (ocf)
pound (lsb)
apace2 (ocf)
ha.cf:
deadtime 5
deadping 5
initdead 60
warntime 5
autojoin any
crm true
udpport 3636
ucast eth0 10.1.100.177 # 10.1.100.178 on the second node
ucast eth1 10.1.66.177 # 10.1.66.178 on the second node
respawn root /usr/lib64/heartbeat/mgmtd # Enable GUI
management
tool
ping 10.1.100.1
cib.xml:
<cib generated="true" admin_epoch="0" have_quorum="true"
ignore_dtd="false" num_peers="2" cib_feature_revision="1.3"
num_updates="1" epoch="44" cib-last-written="Wed Oct 24 16:43:25 2007"
ccm_transition="2" dc_uuid="dd9f8237-50c4-482e-9b98-924c0b878a04">
<configuration>
<crm_config/>
<nodes>
<node uname="plspgen01" type="normal"
id="dd9f8237-50c4-482e-9b98-924c0b878a04">
</node>
<node uname="plspgen02" type="normal"
id="d3d4fb5d-8cf0-4c60-90cd-b5ec9b24c980">
</node>
</nodes>
<resources>
<group ordered="true" collocated="true" id="EnterpriseSprayer">
<primitive id="ip_10-1-100-180" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.180">
<instance_attributes
id="ip_10-1-100-180_instance_attributes">
<attributes>
<nvpair id="9593fec4-fc97-4e7d-a74d-2ffd06a7be5e"
name="ip"
value="10.1.100.180"/>
<nvpair id="ip_10-1-100-180_target_role"
name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ip_10-1-100-181" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.181">
<instance_attributes
id="ip_10-1-100-181_instance_attributes">
<attributes>
<nvpair id="4f3c49f9-dcd9-4f46-81b8-eb13cb38f8d4"
name="ip"
value="10.1.100.181"/>
<nvpair id="ip_10-1-100-181_target_role"
name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="ip_10-1-100-182" class="ocf" type="IPaddr"
provider="heartbeat" is_managed="true" description="HA Address
10.1.100.182">
<instance_attributes
id="ip_10-1-100-182_instance_attributes">
<attributes>
<nvpair id="b474dc00-41b7-4982-8eda-4a20105dd706"
name="ip"
value="10.1.100.182"/>
<nvpair id="ip_10-1-100-182_target_role"
name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="apache_process" class="lsb" type="apache2"
provider="heartbeat" description="Apache process running on the HA
addresses">
<instance_attributes id="apache_process_instance_attrs">
<attributes>
<nvpair id="apache_process_target_role"
name="target_role"
value="started"/>
</attributes>
</instance_attributes>
<operations>
<op id="b1" name="stop" timeout="3s"/>
<op id="b2" name="start" timeout="5s"/>
<op id="b3" name="monitor" interval="10s" timeout="3s"/>
</operations>
</primitive>
<primitive class="lsb" type="pound2" provider="heartbeat"
description="Pound process running on the HA addresses"
id="pound_process">
<instance_attributes id="pound_process_instance_attrs">
<attributes>
<nvpair name="target_role"
id="pound_process_target_role"
value="started"/>
</attributes>
</instance_attributes>
</primitive>
<instance_attributes id="EnterpriseSprayer_attributes">
<attributes>
<nvpair id="EnterpriseSprayer_target_role"
name="target_role"
value="started"/>
</attributes>
</instance_attributes>
</group>
<clone id="pingd">
<instance_attributes id="pingd">
<attributes>
<nvpair id="pingd-clone_max" name="clone_max"
value="2"/>
<nvpair id="pingd-clone_node_max" name="clone_node_max"
value="1"/>
</attributes>
</instance_attributes>
<primitive id="gateway" class="ocf" type="pingd"
provider="heartbeat">
<operations>
<op id="gateway:child-monitor" name="monitor"
interval="20s" timeout="40s" prereq="nothing"/>
<op id="gateway:child-start" name="start"
prereq="nothing"/>
</operations>
<instance_attributes id="pingd_inst_attrs">
<attributes>
<nvpair id="pingd-dampen" name="dampen" value="5s"/>
<nvpair id="pingd-multiplier" name="multiplier"
value="100"/>
</attributes>
</instance_attributes>
</primitive>
</clone>
</resources>
<constraints>
<rsc_colocation id="colocation_EnterpriseSprayer"
from="EnterpriseSprayer" to="EnterpriseSprayer" score="INFINITY"/>
<rsc_location id="gateway:connected" rsc="EnterpriseSprayer">
<rule id="gateway:connected:rule" score="-INFINITY"
boolean_op="or">
<expression id="gateway:connected:expr:undefined"
attribute="pingd" operation="not_defined"/>
<expression id="gateway:connected:expr:zero"
attribute="pingd"
operation="lte" value="0"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
<
node1_log
.out><node2_log.out>_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems