On Fri, Oct 28, 2022 at 01:22:57PM +0200, Martijn van Duren wrote:
> I wondered that as well, but I tried to simulate the not found and
> error code-paths, but I couldn't trigger it. So I'm not ruling it
> out, I just can't reproduce it.
> 
> Another thing that's weird is that it looks like the index has been
> stripped from sensorStatus, which might be an indication that
> weird is going on inside libagentx. But like I said: without a
> reproducer I haven't been able to pin it down.
> 
> So the additional verbose information should be useful.
> Come to think of it: The `sysctl hw.sensors` output might be
> helpful as well, both on a succeeding run, as well as at the time
> of the crash (maybe something like:
> `while true; do date; sysctl hw.sensors; sleep 1; done > \
> /path/to/output`)

As the offending machines are VMs, hw.sensors actually returns
nothing.  I will send you the output for all of 'hw' key, and
log output for snmpd -vv when the issue arrives.

It does seem to coincide with librenms's discovery process, which
comes from librenms upstream as this cron job (on a linux machine):
33 */6 * * * librenms /opt/librenms/cronic /opt/librenms/discovery-wrapper.py 1

So, it is the one job running every ~6 hours which would match up with
when snmpd is dying on these OpenBSD 7.2 VMs.  I still have 30+ VMs
on <7.2 that are OK.  Any physical machines I've upgraded to 7.2 are
only at home, not $WORKPLACE where librenms lives.  Not trying to be
noisy, just hopefully narrow down the actual cause :)  Thanks for
the hints!

Regards,
-Ryan

> 
> @Ryan: this info doesn't need to be on the list, so feel free to
> send it to me in private if you want.
> 
> On Fri, 2022-10-28 at 11:08 +0100, Stuart Henderson wrote:
> > I wonder if there are any sensors which disappear and reappear..
> > 
> > On 2022/10/28 10:01, Martijn van Duren wrote:
> > > Could you run snmpd with `-vv`? That way I also have the specific
> > > OIDs being requested and returned (both frontend and backend) and
> > > might make it a little more easy to reproduce.
> > > 
> > > Do note that this adds at least 4 log lines for every request
> > > issues to snmpd, so your logfile might explode a bit.
> > > 
> > > martijn@
> > > 
> > > On Thu, 2022-10-27 at 14:08 -0700, Ryan Freeman wrote:
> > > > On Thu, Oct 27, 2022 at 01:46:21PM -0700, Ryan Freeman wrote:
> > > > > Hello,
> > > > > After upgrading some virtual machines to OpenBSD 7.2, I started 
> > > > > noticing
> > > > > snmpd dying approx every 6 hours on the upgraded machines.
> > > > > 
> > > > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > > > 2506302838 
> > > > > iso.org.dod.internet.private.enterprises.openBSD.sensorsMIBObjects.sensors.sensorTable.sensorEntry.sensorStatus:
> > > > >  oids not equal
> > > > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > > > Closing: Too many parse errors
> > > > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > > > Closed by snmpd (Too many AgentX parse errors from peer)
> > > > > Oct 27 13:14:33 mirror snmpd_metrics[88325]: [fd:0 sess:2580462718 
> > > > > ctx:<default>]: unsupported call: agentx-Close-PDU
> > > > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451): Connection 
> > > > > reset by peer
> > > > > Oct 27 13:14:33 mirror snmpd[98795]: snmpe: AgentX(1268939451): 
> > > > > disappeared unexpected
> > > > > 
> > > > > The message is always the same, it tends to be around 1:20am, 7:20am, 
> > > > > 1:20pm, 7:20pm
> > > > > I have a script set to check "rcctl ls failed" and notify if 
> > > > > something has failed.
> > > > > 
> > > > > LibreNMS is used to scrape the snmpd instances on the affected VMs.
> > > > 
> > > > And, forgot to include the snmpd.conf, apologies.  here it is with minor
> > > > changes values only:
> > > > # $OpenBSD: snmpd.conf,v 1.2 2021/08/08 13:43:10 sthen Exp $
> > > > 
> > > > # See snmpd.conf(5) for more options (tcp, alternative ports, trap 
> > > > listener)
> > > > listen on 127.0.0.1
> > > > 
> > > > user "changed" auth hmac-sha1 authkey "randomstuff" enc aes enckey 
> > > > "morerandomstuff"
> > > > 
> > > > # Adjust the local system information
> > > > system contact "Systems Team ([email protected])"
> > > > #system location "Rack A1-24, Room 13"
> > > > 
> > > > # Required by some management software
> > > > system services 74
> > > > 
> > > > LibreNMS then scrapes it using snmpv3 and authPriv mode.
> > > > no core file is being dropped by snmpd
> > > > 
> > > > -Ryan
> > > > 
> > > 
> 

Reply via email to