Re: [Nut-upsuser] UPS Socomec is unavailable appears in the logs

Charles Lepple Fri, 12 Feb 2016 05:55:01 -0800

On Feb 8, 2016, at 8:46 AM, Henning Fehrmann <[email protected]> 
wrote:
> 
> Hello,
> 
> we have two UPSes - one Galaxy and one from Socomec, both monitored by the 
> NUT server.
> 
> The NUT server runs the version 2.7.2.
> 
> We have roughly 3500 clients running upsmon, reading the status of both
> UPSes. They run mainly the version 2.6.4 and some 2.7.2.
> 
> We observed the following on both client versions, regardless whether
> we run upsmon in the daemon or debug mode:
> 
> Running
> upsc socomec@ip_socomec
> returns a list of all required values.
> 
> OTOH, we find on all clients the folloging logs:
> upsmon[1608]: UPS socomec@ip_socomec is unavailable


I think this error message comes from this check:

https://github.com/networkupstools/nut/blob/1993c39866b4625c0218fda1dfa9b037934c4c47/clients/upsmon.c#L1458

"empty response is the same as a dead ups"

Corresponds to these lines from strace:

4275  14:08:05.693306 write(6, "GET VAR socomec ups.status\n", 27) = 27
4275  14:08:05.693373 select(7, [6], NULL, NULL, {5, 0}) = 1 (in [6], left {4, 
997653})
4275  14:08:05.695899 read(6, "VAR socomec ups.status \"\"\n", 64) = 26

If you run "upsc socomec@ip_socomec ups.status", it should return (at a 
minimum) "ups.status: OL", "ups.status: OB" or "ups.status: OB LB".

The Galaxy seems to be working (from a few lines earlier):

4275  14:08:05.689595 write(4, "GET VAR galaxy ups.status\n", 26) = 26
4275  14:08:05.689721 select(5, [4], NULL, NULL, {5, 0}) = 1 (in [4], left {4, 
996929})
4275  14:08:05.692927 read(4, "VAR galaxy ups.status \"OL\"\n", 64) = 27

There was a change to the Netvision MIB file (included in NUT 2.7.3) that might 
fix the empty ups.status:

https://github.com/networkupstools/nut/commit/4552d14162086a280445efc5331a8e6ad7b1c3a4

It should not be necessary to update the clients to NUT 2.7.3, only the Socomec 
NUT server.

> 
> This appears regularly each little bit more than 300s and corresponds to
> the NOCOMMWARNTIME in the upsmon.conf.
> 
> The log entry appears only for the Socomec UPS, not for the Galaxy.
> 
> The process list shows that /sbin/upsmon is forking occasionally, generating 
> a child in the Z state
> 
> root      3294  0.0  0.0  10448   704 ?        Ss   14:02   0:00   
> /sbin/upsmon
> nut       3295  0.0  0.0  10448   580 ?        S    14:02   0:00     
> /sbin/upsmon
> nut       3497  0.0  0.0      0     0 ?        Z    14:02   0:00       
> [upsmon] <defunct>

It is unclear to me why this process is a zombie (defunct). I would expect 
upsmon to fork every 300s to run the NOTIFYCMD (upssched, in your case) due to 
the empty ups.status, but since the wait4 syscall (the waitpid() library 
function in the code) is being called, it should reap the zombie process. Maybe 
you are just seeing the window of time between the fork() and the waitpid()?

> On the client side the ups.conf is
> [galaxy]
>        driver = galaxy
>        port = ip_NUTserver
>        desc = "GALAXY 6000"
> 
> [socomec]
>        driver = socomec
>        port = ip_NUTserver
>        desc = "SOCOMEC"

Actually, the clients don't need ups.conf - they get the description from the 
server's ups.conf over the NUT network protocol.

> The upsmon.conf reads:
> 
> MONITOR [email protected] 1 monuser XXX master
> MONITOR [email protected] 1 monuser XXX master
> 
> 
> MINSUPPLIES 1
> SHUTDOWNCMD "/sbin/shutdown -h now"
> NOTIFYCMD "/sbin/upssched"
> 

I don't use upssched - maybe someone else can shed some light on this?

> _______________________________________________
> Nut-upsuser mailing list
> [email protected]
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

-- 
Charles Lepple
clepple@gmail




_______________________________________________
Nut-upsuser mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/nut-upsuser

Re: [Nut-upsuser] UPS Socomec is unavailable appears in the logs

Reply via email to