at one point i think it (attrd or pingd) removed the attribute if the value was 0

apparently it doesn't do that anymore and the docs need to be updated

On Apr 13, 2007, at 2:50 PM, Alan Robertson wrote:

Yann Dille wrote:
Dear Heartbeat user community -and Masters-,


I'm having many troubles making working a "simple" DRBD/NFS
Active/Passive config in a 2-node cluster as soon as I want to put
additional feature to increase the availability in case of network
failure : STONITH (suicide) and pingd (failover if the default gateway
gets unreachable).

What I try to do is pretty simple : if the network is detected as down
(by pingd), the DC asks to the "isolated" lrmd to gracefully shutdown
the resources (especially force that the drbd volume has been released
on the "Secondary" state), in order to ensure that the failover node
starts the resources asap. Using STONITH to reboot the network isolated node would also ensure that it will become the next new backup as soon
as the network outage is fixed.


Andrew also has this same snippet in his sample.
<expression id="group_1:not_connected:expr"  attribute="pingd"
operation="not_defined"/>

But, I read the code in pingd, and I don't see how that XML does
anything very useful.

        int num_active = 0;
        {...}

g_hash_table_foreach(ping_nodes, count_ping_nodes, &num_active);
        crm_info("%d active ping nodes", num_active);

        ha_msg_add_int(update, F_ATTRD_VALUE,
                attr_multiplier*num_active);
        {...}
        if(send_ipc_message(attrd, update) == FALSE) {
                crm_err("Could not send update");
                exit(1);
        }

If the number of ping nodes is zero, it looks an awfully lot like it
simply sets the attribute to zero.  There is no code here to make any
kind of a special case for zero, nor remove the definition of the
attribute from the cluster.

Now, it's theoretically possible that the attribute daemon (attrd) would
remove a definition which is set to zero, but that would be bizarre -
since such a behavior wouldn't make sense in general.

It's also theoretically possible that the not_defined function in the
CRM treats a node attribute that's defined but set to zero to be
not_defined().  But that would be even more bizarre, IMHO.

Now making the normal assumption that an attribute which is set to zero
is different from one that's undefined, I don't believe this XML won't
do what you want it to.

I read the code, and I don't see how it it can work.  If it works, I'm
definitely missing something...  Perhaps someone could enlighten me...

In the pingd page, and also in the CIB/Idioms page, I put this example:

<rsc_location id="my_resource:connected" rsc="my_resource">
  <rule id="my_resource:connected:rule" score="-INFINITY" >
    <expression id="my_resource:connected:expr:zero"
      attribute="pingd" operation="le" value="0"/>
  </rule>
</rsc_location>

Given how the code _appears_ to work, I think this should do a better
job of doing what you want.

STONITH suicide doesn't operate except in very special circumstances -
and you've not described them. You need a real hardware STONITH device
for what you want to do.  IIRC, a node will only commit suicide if it
has a stop failure, not if it loses quorum.  Otherwise it would reboot
continually until quorum was restored.  And quite possibly because of
the reboot timings, it would _never_ be restored.  And, in your case
you've also told heartbeat to ignore quorum.  So, that can't be a
factor, and pingd isn't tied into stonith, so you're basically not going
to get what you want to happen.

Also, note that if Heartbeat _did_ use the suicide device connected to
pingd, then  both sides would commit suicide every time your ping node
went down.

pingd is not tied to STONITH in any way.

In general you're doing something else I'd be reluctant to do - you are
ignoring quorum.  Now if your cluster only has two nodes, we do that
implicitly.  But, if you have more than 2 nodes, this behavior might
have effects you didn't anticipate.

If the two sides lose all connectivity, then the kind of configuration
you're asking for (with no real STONITH device).

Note that if the two sides are really isolated, then you need 3rd party
quorum to guarantee that you know when the other machine is rebooted.
Otherwise, one day you'll likely create a DRBD split-brain which would
result in an inability to fail back.

My recommendation (unless Andrew corrects me):
        Get a real STONITH device
        Don't ignore quorum
        Check for the attribute being zero, not for being undefined.

--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to