Hi, recently I observed that during a power outage, my NUT setup doesn't shutdown properly. Indeed it never reaches the LowBattery state to notify and initiate shutdown on clients. Reviewing my setup and logs I think I'm facing multiple problems and I'm unable to point to real root cause.
First, let me focus on my scenario and current setup: In normal status (ONLINE, fully charged) my UPS runtime is about 20minutes, this is some of the relevant info: Charge 100% Charge Low 20% Runtime 20m37s Type PbAc Manufacturer EATON Model Ellipse ECO 1200 Serial 000000000 Power 25VA Frequency Nominal 50Hz Voltage 230.0V Voltage Nominal 230V Delay Shutdown 25s Delay Start 40s Firmware 02 Load 23% Power Nominal 1200VA Status OL During a power outage it used to last for about 15 minutes and then initiate the shutdown. however reviewing the logs of my systems: Apr 19 12:04:23 - OnBattery Apr 19 12:10:21 - Power from UPS cut, all systems got suddenly powered off (no graceful shutdown) Apr 19 12:16:46 - Power restored, UPS feeds power again to the systems and they boot up. So this indicated the UPS only lasted about 6 minutes. Also I have 2 UPS: - UPSdevice, which is the real UPS - HeartBeat : virtual dummy device to monitor proper NUT communication (see http://rogerprice.org/NUT/NUT.html#HEARTBEAT). Looking at the logs when the event starts: Apr 19 12:04:23 laney upsmon[756]: UPS [email protected]: On battery. Apr 19 12:04:23 laney upssched[797]: New timer: onbattwarn (30 seconds) Apr 19 12:04:53 laney upssched[797]: Event: onbattwarn Apr 19 12:08:24 laney upsmon[756]: UPS [email protected]: On battery. Apr 19 12:08:24 laney upssched[797]: Cancelling timer: heartbeat-failure UPSdevice uses timer "onbattwarn", while HeartBeat is using a separated timer "heartbeat-failure". So I supose that when heartbeat cancels the 'heartbeat-failure' the onbattwarn timer is still running. The second concern is quickly spotted: Apr 19 12:09:19 laney usbhid-ups[703]: libusb_get_interrupt: error submitting URB: No such device Apr 19 12:09:24 laney upsd[753]: Data for UPS [UPSdevice] is stale - check driver Apr 19 12:09:26 laney upsd[753]: UPS [UPSdevice] data is no longer stale Seems that during the 'onbattwarn' event, NUT lost communcaiton with the UPS for about 6 secs? this is really quick to trigger any 'COMM_BAD' evebt so no notifcation regarding this is sent, just the logs. However as stated above on the logs, just 1 min later (12:10:21) power is totally lost even from UPS, no trace of LowBattery events, comm errors or so. A part from a email message that upssched sent me (although no logs of the event appears), this email reports: Power ups UPSdevice battery-low notification UPS: [email protected] Notice type: LOWBATT Message: battery-low Sun, 19 Apr 2020 12:10:27 +0200 Charge 73% Charge Low 20% Runtime 15m52s Type PbAc Manufacturer EATON Model Ellipse ECO 1200 Serial 000000000 Type Ups Power 25VA Frequency Nominal 50Hz Voltage 230.0V Voltage Nominal 230V Delay Shutdown 20s Delay Start 30s Firmware 02 Load 21% Power Nominal 1200VA Status FSD ALARM OB LB So, indeed was in LowBattery and FSD shutdown, however note that the current charge was 73% >> 20% low battery, so why it set LB when there were enough battery to run? Is this a comm problem, device issue, driver ? So mainly my concerns questions are: A) Can upsched introduce some race-conditions making a timer cancel cancel other timers not related ? B) May my UPS be faulting or just some USB driver issue? Any other hint, information or idea is welcomed. Regards
_______________________________________________ Nut-upsuser mailing list [email protected] https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
