Hello Coach-X (what a strange name),
> This has happened several times. Nothing shows up in either log file,
> and a hard reboot brings the master back online. Is this caused by
> the serial link still being active? Is there a way to have this type
> of issue cause the slave to become active?
exactly. I personally use 3-ware raid controllers with a raid-1 (mirror)
configured. I monitor these controllers with nagios and switch disks
within 2 days, if one dies. But you could also use a linux software raid
and _sata_ not _pata_ disks to obtain the above. Another way to detect
disk-failures would be a ressource agent who does something like
invalidating the buffer cache and run a find or ls on the filesystem.
And put that resource agent into your group that contains exim.
The monitor action would be something like that
if -f /var/run/ressource-agent is running; then
sync; echo 3 > /proc/sys/vm/drop_caches
ls / &> /dev/null && exit 0 || exit 1
else
exit 7;
fi
See also http://linux-mm.org/Drop_Caches
I assume you use linux, but if you don't find a reasonable supported
raid controller for your hardware architecture / os.
Thomas
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems