On Fri, Nov 26, 2010 at 9:36 AM, Andrew Miklas <[email protected]> wrote:
> Hi,
>
> On 25-Nov-10, at 11:37 AM, Andrew Beekhof wrote:
>
>> Given what you've described, you could probably remove the while loop
>> during stop.
>> It should be safe because Amazon is ensuring that it will only "run"
>> in exactly one location.
>
> I'll give that a try -- thanks.
>
>
> I noticed something else interesting during my testing today -- I'm
> curious if it's related to my testing method or is a sign of a
> configuration error.  To test Pacemaker's response to a node failure,
> I usually use iptables to cut off all network traffic from one node to
> the rest of the cluster.  (I'm doing this instead of the typical
> "unplug the network line" method because I don't have physical access
> to the machines).
>
> For example, I would run this on node test2 of a 3 node test
> environment:
> "iptables -A INPUT -s test1 -j DROP; iptables -A INPUT -s test3 -j
> DROP; iptables -A OUTPUT -d test1 -j DROP; iptables -A OUTPUT -d test3
> -j DROP"
>
> As expected, Pacemaker detects the node failure and starts up all the
> resources that were running on that node elsewhere.  However, when I
> remove the rules with "iptables -F", there if a brief period where
> Pacemaker (or Heartbeat, I suppose) becomes very confused as to which
> nodes are up and which are down.  For example, crm_mon will suddenly
> indicate that test3 is offline, and then show that it is back online
> ten seconds later, even though test3 was always part of the partition
> that had quorum.

Yeah, thats heartbeat I'm afraid.

>
> The problem here is that these spurious node failures cause Pacemaker
> to initiate unnecessary resource migrations.  Is it normal for the
> cluster to become confused for a while when the network connection to
> a node is suddenly restored?

Its normal for the CCM (part of heartbeat) and used to be normal for corosync.
These days I think corosync does a better job in these scenarios.

> Or is this happening because using
> iptables is not a fair test of how the system will respond during a
> network split?

Unless you've got _really_ quick hands, you're also creating an
asymmetric network split.
ie. A can see B but B can't see A.

This would be causing additional confusion at the messaging level.

>
>
> Thanks,
>
>
> Andrew
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to