Re: [Nut-upsdev] some fixes, improvements, and new features (EPO and DYING) for NUT

Greg A. Woods Fri, 09 Mar 2012 14:56:08 -0800

Note that I'm not subscribed to the nut-upsdev list at the moment due to
issues with the mailer and DNS for the list server.

So, please CC me on any replies.  Thanks!

At Thu, 8 Mar 2012 23:01:39 -0500, Charles Lepple <[email protected]> wrote:
Subject: Re: [Nut-upsdev] some fixes, improvements, and new features (EPO and 
DYING) for NUT
> 
> On Mar 8, 2012, at 6:21 PM, Greg A. Woods wrote:
> 
> > Here are a series of my recent changes to NUT.
> > 
> > The first few in the set are primarily little fixes and improvements.
> > 
> > In among those are a few for .gitignore files which of course you can
> > ignore for SVN, and there's one for a commit to a generated file which
> > of course should not be tracked in any VCS.
> 
> We are actually in the process of trying to move the NUT source code
> over to Git, but both conversions by git-svn and Eric S. Raymond's
> reposurgeon are not quite there yet. (We are leaning towards
> reposurgeon, which involves a little more tweaking of commits, but
> produces better results for a one-way SVN-to-Git conversion, including
> .gitignore files generated from svn:ignore properties.)

You might want to look at "git svn" in the latest release again.

Also, ignore all the half-assed half-brained ideas floating around out
there on the internets abou how to use it.  Most of the people writing
about it are only taking into consideration the most basic uses.

I've written up some more-or-less un-published notes on doing more
complete conversions that were based on my work to make use of Git to
maintain local changes to FreeBSD, yet another project that uses SVN.
I've further refined them and the result I have now in my git-svn cloned
copy of the NUT repository seems complete and fully functional, at least
compared to what I can see of the SVN repository independently.

Here's a link to my notes:

       http://www.planix.com/~woods/git.html

Section 6 is what you'll be interested in -- don't pay much heed to any
of the rest -- most of it is not very well tested in practice.  Only the
SVN conversions procedures have received much actual use.

I could make a bundle or tar.gz of my working repos available too if
you'd like to just have a quick peek to see if my way of converting from
SVN had the kinds of results you were looking for.

BTW, Meld is an amazing tool for picking apart changes between files,
and just for viewing changes too.

> Agreed in principle, although I haven't looked to see if collapsing
> any of the unused bits will lead to binary incompatibility. Given how
> distributions tend to lag behind the latest code, we often suggest
> that people just drop in a replacement driver to test certain changes
> without disrupting the rest of the install. This could be completely
> unwarranted fears on my part, though.

Entirely unwarrented indeed!  :-)

The inter-process comunications is, so far as I can tell, entirely free
of any magic binary flag values.

> This is an interesting distinction (one that a few drivers make in
> their different shutdown commands, but that is not currently tied to
> FSD).

Quite a few drivers allow for the distinction in terms of the commands
they accept, but of course with the confusing over-loaded half-specified
ideas previously embodied in "FSD", there isn't any way to really make
use of those commands separately in the existing infrastructure, thus my
addition of a new control word.

Note that I think it's critically important not to over-load the meaning
of these state and status report words, especially not between device
driver programs and the rest of the control infrastructure and
communications protocols.

Thus the need for a separate "I'm DYING!" status from the drivers and an
"OK, we're initiating Emergency Power Off" control from upsmon.

I think it might make sense for a driver to issue a status of "being
shut down administratively out-of-band" so that NUT can learn of a
shutdown initiated from, say, SNMP.  I think this should be a word
unique from "FSD" though just to keep things clear, and it's definitely
got to be different from "DYING" too.  It would represent a situation
where the operator has commanded the UPS to power down the load, either
from the front panel or from SNMP or SSH or whatever, but the systems
would expect to be powered back up again when mains power returns, so
though they might use "halt -p" to power themselves off, they should do
so in such a way that they can reboot when power returns.  When a driver
says "DYING" it is an emergency situation and both the load and the UPS
itself must be powered off permanently and quickly lest physical damage
be incurred by continuing to operate.  After DYING triggers an EPO the
only way back should be if a human restarts the UPS from its front
panel.  It really must be a bit like a traditional "big red switch"
though of course here the idea is to give everything the same chance to
save the files as when "OB+LB" status  does.

I really really hate over-loading terms in protocols or state diagrams.
It leads to enormous confusion, headaches, and inflexible designs.

(I also hate using different terms for different sides of the same
thing.  A UPS, for example, is either "OnLine" (i.e. on mains power) or
not (i.e. on battery).  It might be talking to a control computer but
have its load switched off too, but that's whole different thing.
There's absolutely no need for the "OnBattery" state, and it only
confuses things.  However that's woven into the protocol now and it will
be difficult to extract without going through a deprecation phase and a
major revision, and perhaps adding a protocol version identifier.)

(i.e. "LB" should be sufficient to trigger shutdown!)

(But I'm not proposing an entire protocol rewrite just yet!)

> The reason why I advocated usurping the "FSD" status was because it is
> the only other status besides "OB LB" that is currently guaranteed to
> trigger a shutdown. I wonder if we could just use FSD with some other
> status option to indicate whether the driver should request a restart
> when the power returns.

FSD as-is is useless for my purposes, as I hope you'll see when you get
a chance to look closer at the EPO and DYING patches.

One could rip out FSD and rename my EPO feature to be FSD, but then one
would lose the two important unique features of FSD which are based on
its ability to trigger a "normal" restart.  (I.e. the ability to test
normal restart without actually removing mains power, and the ability to
trigger an early shutdown in order to conserve battery in the even where
the operator knows the mains outage will extend long beyond the current
battery capacity.)

I definitely wasn't going to rip out or otherwise break operator
initiated FSD in my patch.  :-)

> It's definitely a feature I would like to see merged at some
> point. Now that you mention this, I think there are several UPS
> protocols which support a bitmask for alarm conditions which will
> trigger a shutdown (including overtemp). We will want to make sure
> that the procedure for setting that event mask is not terribly
> different depending on whether the shutdown is triggered by the UPS
> hardware, or by NUT monitoring other UPS status (as I believe you are
> proposing with the DYING status).

My idea is that a UPS can, through its driver, identify any condition
which would trigger an alarm for a warning, and a DYING state (with the
alarm still set as an explanation for the DYING state for an exceeded
maximum (or minimum) condition; though even that much overloading of
alarm status could be confusing without explicitly including the
knowledge of whether it's a warning or a min/max-exceeded kind of alarm,
and then we would end up with an invalid protocol state where DYING
could be issued without an alarm, but that's what I've got now in some
drivers -- maybe it would be better if DYING was like ALARM and included
an explanation, but as-is if each bit of code which sets DYING writes a
log entry so all is not lost).

Of course with so few drivers currently implementing anything in the
alarm feature, it's hard to get a higher level view of how things could
be.

In theory the same DYING feature could have been used for low-battery
too, but that'll mean a whole protocol rewrite.

Clearly this kind of lifetime and growth wasn't planned into NUT's
design, but it's kind of like in biology where things might be a bit
overly complicated and convoluted in places, however it works!

-- 
                                                Greg A. Woods
                                                Planix, Inc.

<[email protected]>       +1 250 762-7675        http://www.planix.com/

pgpkciclKLXgX.pgp
Description: PGP signature

_______________________________________________
Nut-upsdev mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsdev

Re: [Nut-upsdev] some fixes, improvements, and new features (EPO and DYING) for NUT

Reply via email to