Re: [Linux-HA] heartbeat failover not working on hard drive error

Thomas Glanzmann Fri, 28 Mar 2008 00:58:57 -0700

Hello Coach-X (what a strange name),

> This has happened several times.  Nothing shows up in either log file,
> and a hard reboot brings the master back online.  Is this caused by
> the serial link still being active?  Is there a way to have this type
> of issue cause the slave to become active?


exactly. I personally use 3-ware raid controllers with a raid-1 (mirror)
configured. I monitor these controllers with nagios and switch disks
within 2 days, if one dies. But you could also use a linux software raid
and _sata_ not _pata_ disks to obtain the above. Another way to detect
disk-failures would be a ressource agent who does something like
invalidating the buffer cache and run a find or ls on the filesystem.
And put that resource agent into your group that contains exim.

The monitor action would be something like that

if -f /var/run/ressource-agent is running; then
        sync; echo 3 > /proc/sys/vm/drop_caches
        ls / &> /dev/null && exit 0 || exit 1

else
        exit 7;
fi

See also http://linux-mm.org/Drop_Caches

I assume you use linux, but if you don't find a reasonable supported
raid controller for your hardware architecture / os.

        Thomas
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover not working on hard drive error

Reply via email to