Re: [Nut-upsuser] NUT with Cyber Power 700 AVR

Rob Donovan Mon, 30 Aug 2010 11:43:11 -0700

1) syslog errors every 20+ minutes or so like : Aug 7 10:21:03 benusbhid-ups[3321]: libusb_get_string: error sending control message:Broken pipe
Not a cause of concern. It is a way of telling that the UPS iscurrently not able to handle a command. Most likely this is due to theUPS doing some internal housekeeping functions and the littlemicrocontroller inside is not able to handle a command. We willprobably suppress this in future NUT versions, as it is a common causeof false alarms.
2) syslog errors on a similar timescale like : Aug 7 08:17:40 benkernel: [40170.402789] usb 2-1.2: usbfs: USBDEVFS_CONTROL failed cmdusbhid-ups rqt 161 rq 1 len 8 ret -110
Same here. The kernel is informing you that the UPS didn't respond toa command (110 = ETIMEDOUT). The cause is most likely the same as theabove and not a cause of concern either. Unlike the above message,there is nothing we can do about this as it is logged by the kernel.

Good to know.  Thanks for the reply.

3) The machine spontaneously shutdown this morning due to a "lowbattery" condition. However, 80 minutes later when I noticed the UPSbattery was at 100%. I don't think it can charge that fast, so Ithink this must have been a communication error.
I'm not so sure about that. Don't overestimate the accuracy of thebattery charge gauge on the UPS. It could be that it is just voltagebased, which means that it will indicate full charge long before thebattery is actually full. It could also mean that the battery is bad.This may cause nearly instant shutdowns when the mains fails (when thebattery is under load) while it looks like the battery is (almost)full with the mains present (and the battery is not under load).Running a battery test usually reveals what is going in.
Best regards, Arjen

Fair points, but I think the battery is good. I've run severalon-battery shutdowns lasting 90s+ by flipping the breaker (usingupssched to initiate shutdown after 60s) and that works fine. I ran alonger test for 2 or 3 minutes once and watched the UPS displayedestimated run time count down from 76 minutes as you'd expect it to.The UPS is brand new. Also, I suspect there was no power cut - in thepast I've had to reset my stove clock after a power cut, and I don'trecall having to do that this time. I have my system set up to shutdownafter 5 mins on battery, rather than wait for a lowbatt condition, so Idoubt the low batt could have been reached due to a power cut.... unlessperhaps it was a night of successive 4 minute power cuts or, given thestove, 4 minute low-voltage conditions. I guess we'll never know forsure... in any case, this was enough for me to abandon 2.4.3 and tryCyberpower's own offering, which suffered from curious delays itselfwhich I wasn't happy with given that the eventual power-off is timerbased rather than signal based as in nut, and thence back to nut 2.2.2...

So, I went back to nut 2.2.2 under Debian Lenny with both MAXAGE andDEADTIME set to 150s. This worked OK for 10 days, with the odd type (2)error from above, and the odd stale data error [aside : it is myunderstanding that data must now be stale for 150s for upsmon to log astale data warning to syslog, since upsd doesn't pass on the stale datacondition until MAXAGE is reached. So for 30 lots of 5s polls the datais stale... then it shows up in syslog, and, and this is what's weird,in almost every case it resolves itself 2s later.... just like it didwhen MAXAGE was 15s.] After 10 days it went into a stale data conditionthat continued all night.... until I stopped it by restarting nut in themorning.

Since restarting nut seemed to fix the problem I decided to makeupssched restart nut on a NOCOMM condition. I'll briefly describe how Idid that here in case others are interested:


I set

NOTIFYFLAG NOCOMM       SYSLOG+WALL+EXEC

in upsmon.conf in the usual way with

NOTIFYCMD /sbin/upssched

set to call upssched.  In upssched.conf I set

CMDSCRIPT /sbin/upsschedcmd

and

AT NOCOMM   * EXECUTE restart

/sbin/upsschedcmd is my command script, the relevant portion of which is :

#!/bin/bash

# This script is called by upssched on a UPS event.# This script is designed to be run by user nut.


case $1 in
 restart)
   /sbin/upsrestart.x
   ;;
esac

upsrestart.x is the following C code, compiled using the gcc line in thecomment, and chowned/chmoded to have the ownership/permissions in the2nd comment :



#include <stdio.h>
#include <unistd.h>

/*

This program is designed to restart nut.
The binary file permissions should be -rwsr-xr-- root:nut

gcc -g -Wall -o upsrestart.x upsrestart.c

*/

int main (int argc, char *argv[])
{

 char *arg[] = { "/etc/init.d/nut", "restart", (char *) NULL };

char *env[] = { "USER=root", "PATH=/usr/sbin:/usr/bin:/sbin:/bin","HOME=/root", (char *) NULL };


 execve (arg[0], arg, env);

 // if execve() returns there has been an error

 fprintf(stderr,"upsrestart.c : error calling execve()\n");

 return(0);

}

What happens is that upssched runs /sbin/upsschedcmd as user nut, whichruns the setuid program upsrestart.x as nut which runs /etc/init.d/nutrestart as effective user root, restarting nut and, it appears so far,reestablishing connection with the UPS. Since this runs on NOCOMM,default timeout 300s, that becomes the max time your system can't talkto your UPS. Since I have DEADTIME set to 150s, a stale UPS that waslast known to be on battery will shutdown before the NOCOMM restarttakes effect.The binary wrapper is necessary because Linux ignores setuid bitsapplied to scripts. Furthermore, modern versions of bash drop setuidprivileges on startup, unless called with -p. The /etc/init.d/nutscript uses /bin/sh. The above works on Debain because, according tothe "system" man page (of all places :) : "Debian uses a modified bashwhich does not do this when invoked as sh". On other flavours of Linuxyou may need to tweak the first line of /etc/init.d/nut to prevent itdropping privileges.

I think the above is safe because the binary can only restart nut,nothing else, and can only be run by root or nut. I'm not exactly asecurity expert though, so I might be wrong.

Anyway, I setup the above 10 days ago, and this morning it triggered. Ihave it configured to send me an email too. It sent one email, andrestarted nut successfully. Comms were reestablished. The only thingthat didn't go entirely according to plan is that the old upsmon stuckaround as a defunct nut process and a running root process. I don'tknow why they didn't die, but they were easily killed off latermanually. It was definitely better to get one email and commsreestablished after 5 minutes than 70 emails and no communications allnight.


best
/rob


_______________________________________________
Nut-upsuser mailing list
Nut-upsuser@lists.alioth.debian.org
http://lists.alioth.debian.org/mailman/listinfo/nut-upsuser

Re: [Nut-upsuser] NUT with Cyber Power 700 AVR

Reply via email to