On Fri, 2022-10-28 at 13:10 -0700, Ryan Freeman wrote:
> On Fri, Oct 28, 2022 at 01:22:57PM +0200, Martijn van Duren wrote:
> > I wondered that as well, but I tried to simulate the not found and
> > error code-paths, but I couldn't trigger it. So I'm not ruling it
> > out, I just can't reproduce it.
> > 
> > Another thing that's weird is that it looks like the index has been
> > stripped from sensorStatus, which might be an indication that
> > weird is going on inside libagentx. But like I said: without a
> > reproducer I haven't been able to pin it down.
> > 
> > So the additional verbose information should be useful.
> > Come to think of it: The `sysctl hw.sensors` output might be
> > helpful as well, both on a succeeding run, as well as at the time
> > of the crash (maybe something like:
> > `while true; do date; sysctl hw.sensors; sleep 1; done > \
> > /path/to/output`)
> 
> As the offending machines are VMs, hw.sensors actually returns
> nothing.  I will send you the output for all of 'hw' key, and
> log output for snmpd -vv when the issue arrives.
> 
> It does seem to coincide with librenms's discovery process, which
> comes from librenms upstream as this cron job (on a linux machine):
> 33 */6 * * * librenms /opt/librenms/cronic /opt/librenms/discovery-wrapper.py 
> 1
> 
> So, it is the one job running every ~6 hours which would match up with
> when snmpd is dying on these OpenBSD 7.2 VMs.  I still have 30+ VMs
> on <7.2 that are OK.  Any physical machines I've upgraded to 7.2 are
> only at home, not $WORKPLACE where librenms lives.  Not trying to be
> noisy, just hopefully narrow down the actual cause :)  Thanks for
> the hints!
> 
> Regards,
> -Ryan
> 
> 
I managed to reproduce it with an empty sensors table and doing a
getnext request on sensorNumber.0.

The problem was that the internal OID was incremented from from
sensorNumber.0 to sensorStatus, which then triggers an endOfMibView.
When returning a response this incremented value is then send back to
snmpd, while in the case of an endOfMibView it must be the value
requested by snmpd (at least for the getnext case, which is what is
being used here).

Diff below resets this key on endOfMibView and fixes the problem for
me. Can you confirm this?

Assuming this also fixes things for Ryan: OK?

martijn@

Index: agentx.c
===================================================================
RCS file: /cvs/src/lib/libagentx/agentx.c,v
retrieving revision 1.19
diff -u -p -r1.19 agentx.c
--- agentx.c    14 Oct 2022 15:26:58 -0000      1.19
+++ agentx.c    30 Oct 2022 08:19:29 -0000
@@ -3426,6 +3426,8 @@ agentx_varbind_endofmibview(struct agent
                return;
        }
 
+       bcopy(&(axv->axv_start), &(axv->axv_vb.avb_oid),
+           sizeof(axv->axv_start));
        axv->axv_vb.avb_type = AX_DATA_TYPE_ENDOFMIBVIEW;
 
        if (axv->axv_axo != NULL)

Reply via email to