Hello List,
Any suggestions to solve the following would be most appreciated.
Setup: Active/Passive Two Node Cluster. Two UPSes (APC Smart-UPS 1500 C) with
USB communication cables cross connected (ie UPS-webserver1 monitored by
webserver2, and vice versa) to allow for stonith/fencing
OS OpenSuse Leap 42.2
NUT version 2.7.1-2.41-x86_64
Fencing agent: external/nut
Problem: When power fails to a single UPS, both nodes are shutdown. The node
with the still powered UPS comes back up, but requires manual intervention to
keep it providing services. I would like only the node with the "On Battery"
UPS to shutdown.
The resupply of services problem seems to be that NUT on the node that comes
back up will not restart until the other node restarts.
Stonith and my upssched-cmd script both use
upscmd -u ups-webserver2-master -p mypassword ups-webserver2@webserver1
shutdown.reboot
or
upscmd -u ups-webserver1-master -p mypassword ups-webserver1@webserver2
shutdown.reboot
as appropriate. When the cluster software (Pacemaker/Corosync) use the one of
above command as part of a fencing operation, only the target node is shutdown,
and its UPS's outlets power-cycled. When NUT via my upssched-cmd script issues
one of the above commands both nodes shutdown and both of their UPS's outlets
power-cycle.
This problem should be very rare, but it would be better to cover it rather
than not.
Power failure and resupply to both UPSes (the most common problem for me) works
well. I use upssched to set the same timers after power failure on each system.
The receive simultaneous shutdown commands, which they obey. When power returns
they both come back up.
Stonith/Fencing via the stonith resource agent external/nut resource agent
works.
Thanks,
Tim.
My config files
ups.conf
On webserver1
[ups-webserver2]
driver = usbhid-ups
port = auto
desc = "APC Smart-UPS C 1000/1500va"
vendorid = 051d
On webserver2
[ups-webserver1]
driver = usbhid-ups
port = auto
desc = "APC Smart-UPS C 1000/1500va"
vendorid = 051d
nut.conf
MODE=netserver
upsd.conf
Webserver1
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.21 3493
Webserver2
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.1.22 3493
upsd.users
defines users (special settings required for stonith to work)
On webserver1
[ups-webserver2-slave]
password = mypassword
actions = SET
instcmds = ALL
upsmon slave
[ups-webserver2-master]
password = mypassword
actions = SET
actions = FSD
instcmds = ALL
upsmon master
On webserver2
[ups-webserver1-slave]
password = mypassword
actions = SET
instcmds = ALL
upsmon slave
[ups-webserver1-master]
password = mypassword
actions = SET
actions = FSD
instcmds = ALL
upsmon master
upsmon.conf
Webserver1
MONITOR ups-webserver1@webserver2 1 ups-webserver1-master mypassword master
MONITOR ups-webserver2@localhost 0 ups-webserver2-slave mypassword slave
Webserver2
MONITOR ups-webserver2@webserver1 1 ups-webserver2-master mypassword master
MONITOR ups-webserver1@localhost 0 ups-webserver1-slave mypassword slave
It needs the following
upsmon.conf
NOTIFYCMD /usr/sbin/upssched
NOTIFYFLAG ONLINE SYSLOG+WALL+
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
Configure 'upssched' by editing upssched.conf
upssched.conf
webserver1
CMDSCRIPT /bin/upssched-cmd
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver2@localhost START-TIMER onbatt-ups-webserver2 600
AT ONLINE ups-webserver2@localhost CANCEL-TIMER onbatt-ups-webserver2
webserver2
CMDSCRIPT /bin/upssched-cmd .
PIPEFN /var/lib/ups/upssched/upssched.pipe
LOCKFN /var/lib/ups/upssched/upssched.lock
AT ONBATT ups-webserver1@localhost START-TIMER onbatt-ups-webserver1 600
AT ONLINE ups-webserver1@localhost CANCEL-TIMER onbatt-ups-webserver1
Edit /bin/upssched-cmd
/bin/upssched-cmd
webserver1
case $1 in
onbatt-ups-webserver1)
logger -t upssched-cmd "UPS-Webserver1 has gone on battery."
;;
onbatt-ups-webserver2)
logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
/usr/bin/upscmd -u ups-webserver2-master -p mypassword
ups-webserver2@webserver1 shutdown.reboot
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac
Webserver2
case $1 in
onbatt-ups-webserver1)
logger -t upssched-cmd "UPS-Webserver1 has been gone on
battery."
/usr/bin/upscmd -u ups-webserver1-master -p mypassword
ups-webserver1@webserver2 shutdown.reboot
;;
onbatt-ups-webserver2)
logger -t upssched-cmd "UPS-Webserver2 has gone on battery."
;;
*)
logger -t upssched-cmd "Unrecognized command: $1"
;;
esac
_______________________________________________
Nut-upsuser mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser