Re: [Linux-HA] How to monitor the nic link status

Pavlos Parissis Tue, 30 Nov 2010 13:56:10 -0800

On 30 November 2010 10:31, Max <[email protected]> wrote:
>
> Nikita,
>
>> ...
>>  - what about configure monitor operation of IP in cib.xml - sth. like this:
>>    <resources>
>>       <primitive id="IPaddr_194_37_40_42" class="ocf" provider="heartbeat"
>> type="IPaddr">
>>          <meta_attributes id="primitive-IPaddr_194_37_40_42meta"/>
>>           <operations>
>>             <op name="monitor" interval="60s" id="IPaddr_194_37_40_42_mon"
>> timeout="60s"/>
>>           </operations>
>>
>> - it works for me very well ;-)
>
> As far as I can see the "monitor" function of "IPaddr" basically
> pings the IP address of the interface
It has been reported that we shouldn't use IPaddr but IPaddr2, which
doesn't use ping.


> ... unfortunately, at least
> under RedHat/CentOS, if you physically pull the plug on an ethernet
> then the interface will still continue to ping successfully on it's
> own address, even though the link is in fact down - so the ping is
> not telling you that everything is working as it should. I do not
> think that "IPaddr2" does any better.
It doesn't do any ping test.It also checks if the IP is configured on
the system, have a look at the code
[r...@pbxsrv1 ~]# grep -A 40 'ip_served()'
/usr/lib/ocf/resource.d/heartbeat/IPaddr2
ip_served() {
        if [ -z "$NIC" ]; then # no nic found or specified
                echo "no"
                return 0
        fi

        cur_nic="`find_interface $BASEIP`"

        if [ -z "$cur_nic" ]; then
                echo "no"
                return 0
        fi

        if [ -z "$IP_CIP" ]; then
                case $cur_nic in
                lo*)    if [ "$LVS_SUPPORT" = "1" ]; then
                                echo "no"
                                return 0
                        fi
                        ;;
                esac

                echo "ok"
                return 0
        fi

        # Special handling for the CIP:
        if [ ! -e $IP_CIP_FILE ]; then
                echo "partial2"
                return 0
        fi
        if egrep -q "(^|,)${IP_INC_NO}(,|$)" $IP_CIP_FILE ; then
                echo "ok"
                return 0
        else
                echo "partial"
                return 0
        fi

        exit $OCF_ERR_GENERIC
}

>
> Mia,
>
>> ...
>> why not just using ethtool or other mii tools to detect the link failure in
>> IPaddr2 script?
>
> Just looking at the link status will not tell you if something else
> is wrong with your connectivity to the network and the other cluster
> nodes - so you need to use something like the "ping" resource as
> suggested by Lars.
xm I have to disagree on that. My expectation from the IPaddr2
resource regarding to monitor are the following
1) check if IP is configured - it does that
2) check if the interface is up - Currently it doesn't do it
I don't expect from IPaddr2 to run any connectivity tests because
IPaddr2 offers as a service an IP and that's it.
If I want connectivity tests then I will use the ping resource.
>
> However, IMHO, something should be monitoring the local link status,
> as it is a very quick and cheap way to find out the health of your
> connection, rather than relying on pings all the time. Monitor the
> link status very often, and do pings every N times that you find
> that the link is up - the link status is probably a pretty good
> indicator that you have connectivity.

I agree on that

>
> [One thing I dislike about the "ocf:pacemaker:ping" resource is that
>  it just sets an attribute and never actually stop/starts if it has
>  failed to ping something - this means that when looking from crm_mon
>  you may see that an IP resource has been moved to another node, but
>  it is not obvious that it has moved because the link is down, ping
>  is still happily 'running' (yes, yes, there are other things which
>  can tell you what happened).
and to find in the logs that the failover was caused from the ping
resource is also tricky because the return status for failure is 0 and
for success 1.

> I understand why things are like this
>  but it is a pity that from the monitor is not just a little bit
>  more obvious what is going on ... my solution is to have an extra
>  resource that will stop/start depending on the value of the attribute
>  set by ping - is there a better way?]
>
I don't think so. what I use for now is the following

primitive ping ocf:pacemaker:ping \
        params host_list="192.168.78.4" name="ping" \
        op monitor interval="10s" timeout="60s" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"

clone ping_clone ping \
        meta globally-unique="false" target-role="Started"

location pbx_service_01_on_connected_lan pbx_service_01 \
        rule $id="pbx_service_01_on_connected_lan-rule" -inf:
not_defined ping or ping lte 0

In the host_list i

> Max
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to monitor the nic link status

Reply via email to