On Dec 4, 2013, at 3:35 PM, Roger Price wrote:
> I would like nut to become more loquacious, and to log a much more complete
> report of its activity. At present nut reports that its components have
> started operation but does not automatically log their activity when UPS's
> switch between OB and OL. I believe that this under-reporting of important
> facts is too minimalist - it would be better for system administrators and
> for the nut support team if a much more complete report were available of all
> OB/OL activity by each component.
In principle, more logging sounds like a good idea. What syslog level
adjustments would you propose?
> Looking at the source code, it seems that much of what is needed is already
> in place, but behind "if" conditions that ensure that little or nothing gets
> through. Long ago I wrote software, including a compiler, but my C
> programming is limited to a class exercise many many years ago, and its based
> on this "experience" that I'm guessing that in upssched.c function exec_cmd
> the code
>
> snprintf(buf, sizeof(buf), "%s %s", cmdscript, cmd);
> err = system(buf);
> if (WIFEXITED(err)) {
> if (WEXITSTATUS(err)) {
> upslogx(LOG_INFO, "exec_cmd(%s) returned %d", buf,
> WEXITSTATUS(err));
> }
>
> attempts to send a command to the operating system, possibly to execute a
> Bash script. If system(buf) fails, the tests block the error message. Surely
> the error message is essential. An unattended box is now in an emergency
> situation. After the inevitable IT failure the system should be auditable to
> discover what went wrong and what should be done to prevent it happening in
> the future. Such an audit expects to find "exec_cmd(%s) returned %d" in the
> log.
Are you looking for:
* more diagnostics depending on the value of err,
* logging of all return codes, even success
or both?
> "But these problems should be found by testing!", one might argue. Firstly,
> the testing would be facilitated by this error message, and secondly, no
> amount of testing will ever cover every situation met in the real world.
>
> I believe nut would be improved by
>
> 1. Logging a summary of the state of the nut system and the UPS's every 24
> hours.
I would personally prefer that NUT didn't do this by default. (Then again, I
don't do a lot of sysadmin work for critical systems, so take that with a grain
of salt.) To me, this seems like a call to 'upsc' should be placed in a nightly
cron job. If you have multiple UPSes, you can iterate over them. We could add
an example script to the NUT source tree for that.
> 2. Automatically logging a record of driver, upsd, upsmon and upssched
> activity for each OB/OL change.
Fair point. I don't think logging at every single point is necessary, but if
it's configurable, that would work.
> 3. Replacing the upsmon NOTIFYFLAG "SYSLOG" by "NOSYSLOG". All notifications
> are logged unless the sysadmin explicitly calls for no logging.
I suspect I am missing something here. The default upsmon.conf logs everything
to syslog (and wall) by default. Unless that part is broken (and I confess I
haven't thoroughly tested it recently), wouldn't the defaults work without
breaking existing installations? I agree that it is better to err on the side
of logging more information, but I don't think we need to break the existing
syntax to do that.
If anything, I would want finer-grained control over the syslog level for some
of these events.
--
Charles Lepple
clepple@gmail
_______________________________________________
Nut-upsuser mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser