Thanks Marian,
that just did the trick for my solution, only thing is an error in your
script where:
if [ "$iface" == "yes" ];
should have been
if [ "$link_status" == "yes" ];
Now my monitored service fails as soon as a cable is unplugged - which is
just perfect, but the HA tries to just restart it on the same node instead
of failing it over to the other node. How can I make sure that a service is
tried restarted for e.g. 3 times and then failed over if the restart was not
successfull? Do I have to setup stickiness or any constraint and how?
Thanks in advance.
Kasper Andersen
Marian Marinov-2 wrote:
>
> On Friday 21 November 2008 03:12:56 Marian Marinov wrote:
>> On Wednesday 19 November 2008 16:49:55 KAD_USER wrote:
>> > Hi,
>> >
>> > Running SLES10.2 we have the following setup:
>> >
>> > Node1:
>> > - eth0 and eth1 are bonded into bond0, eth3 is used for heartbeat.
>> > - LINUX HA has been setup with one resource which uses bond0 to provide
>> > another virtual IP-address has been setup.
>> >
>> > Node2:
>> > - eth0 and eth1 are bonded into bond0, eth3 is used for heartbeat.
>> > - LINUX HA has been setup and all resources are inherited from Node1.
>> >
>> > Issue:
>> > When someone pulls the ethernet cable on eth0 the node continues to
>> work,
>> > but when someone pulls both eth0 and eth1 and no data can leave or
>> enter
>> > the system one would expect the resource which uses bond0 to fail and
>> > perform a failover of resources as would a default installation of a
>> > Windows Cluster do!
>> >
>> > Is there a way to monitor resources like link status on ethernet cards
>> > and then perform a failover once it is down?
>>
>> Hi,
>>
>> I don't know if there is a script or resource agent ready for that, but
>> here are my 2 bits of code that are a simple LSB script that can help you
>> monitor this resource.
>>
>> I hope you will like the script. You should be able to configure this
>> script as a standard LSB resource in cib.xml.
>>
>> #!/bin/bash
>> #
>> # link-state
>> #
>> # chkconfig: - 26 74
>> # description: Network Interfaces link state monitoring script
>>
>> ### BEGIN INIT INFO
>> # Provides: link-state
>> # Required-Start:
>> # Required-Stop:
>> # Default-Start:
>> # Default-Stop:
>> # Short-Description: provides monitoring and reconfiguration
>> # Description: provides monitoring and reconfiguration
>> # for various network
>> interfaces with easy
>> # configuration, portable
>> on
>> different
>> # distributions
>> ### END INIT INFO
>>
>> # set secure PATH
>> PATH="/bin:/usr/bin:/sbin:/usr/sbin"
>>
>>
>> interfaces='eth0 eth1'
>> ip[0]='10.0.0.1'
>> ip[1]='10.0.0.5'
>> netmask[0]='255.255.255.252'
>> netmask[1]='255.255.255.252'
>>
>> function ifstatus() {
>> for iface in $interfaces; do
>> link_status=$(ethtool $iface|awk '/Link/{print $3}')
>> echo -n "Checking link status on interface $iface "
>> if [ "$iface" == "yes" ]; then
>> echo -ne "\\033[60G[\\033[0;32m OK
>> \\033[0;39m]\r\n" else
>> echo -ne
>> "\\033[60G[\\033[1;38mFAILED\\033[0;39m]\r\n" exit 1
>> fi
>> done
>> }
>> function ifstop() {
>> for iface in $interfaces; do
>> echo -n "Stoping interface $iface "
>> ifconfig $iface down
>> if [ "$?" == 0 ]; then
>> echo -ne "\\033[60G[\\033[0;32m OK
>> \\033[0;39m]\r\n" else
>> echo -ne
>> "\\033[60G[\\033[1;38mFAILED\\033[0;39m]\r\n" exit 1
>> fi
>> done
>> }
>> function ifstart() {
>> count=0
>> for iface in $interfaces; do
>> echo -n "Starting interface $iface "
>> ifconfig $iface ${ip[$count]} netmask ${netmask[$count]}
>> if [ "$?" == 0 ]; then
>> echo -ne "\\033[60G[\\033[0;32m OK \\033[0;39m]\r\n"
>> else
>> echo -ne "\\033[60G[\\033[1;38mFAILED\\033[0;39m]\r\n"
>> exit 1
>> fi
>> done
>> }
>>
>> case "$1" in
>> start)
>> ifstart
>> ;;
>> stop)
>> ifstop
>> ;;
>> restart)
>> ifstop
>> ifstart
>> ;;
>> status)
>> ifstatus
>> ;;
>> *)
>> echo "Usage: $0 start|stop|restart|status"
>> exit 1
>> esac
>> exit 0
>
> Ups, I forgot some things :)
> Here is a patch for my script:
>
> --- link-state.old 2008-11-21 03:26:53.000000000 +0200
> +++ link-state 2008-11-21 03:26:56.000000000 +0200
> @@ -23,21 +23,21 @@ PATH="/bin:/usr/bin:/sbin:/usr/sbin"
>
>
> interfaces='eth0 eth1'
> -ip[0]='10.0.0.1'
> -ip[1]='10.0.0.5'
> -netmask[0]='255.255.255.252'
> -netmask[1]='255.255.255.252'
> +ip=(10.0.0.1 10.0.0.5)
> +netmask=(255.255.255.252 255.255.255.252)
>
> function ifstatus() {
> + count=0
> for iface in $interfaces; do
> link_status=$(ethtool $iface|awk '/Link/{print $3}')
> - echo -n "Checking link status on interface $iface "
> + echo -n "Checking link status on interface
> $iface(${ip[$count]}) "
> if [ "$iface" == "yes" ]; then
> echo -ne "\\033[60G[\\033[0;32m OK
> \\033[0;39m]\r\n"
> else
> echo -ne
> "\\033[60G[\\033[1;38mFAILED\\033[0;39m]\r\n"
> exit 1
> fi
> + let count++
> done
> }
> function ifstop() {
> @@ -63,6 +63,7 @@ function ifstart() {
> echo -ne "\\033[60G[\\033[1;38mFAILED\\033[0;39m]\r\n"
> exit 1
> fi
> + let count++
> done
> }
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
--
View this message in context:
http://www.nabble.com/Network-interface-monitoring-and-failover-once-failed-tp20581371p20677754.html
Sent from the Linux-HA mailing list archive at Nabble.com.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems