Before you do that I'd look at what is coming back via snap via a wireshark or similar.
If you zeroed an expansion module in the middle of the list, then all of the oids for devices after that entry in the list would have shifted to a lower number. The sitemonitor assigns oids based on its knowledge of how many of each i/o type each device takes. It remembers this even if the device isn't attached anymore. By zeroing a device in the middle, it reassigns oids after that point in the table, since it doesn't have the zeroed device info as a placeholder. On Oct 24, 2014 12:01 PM, "Bill Prince via Af" <[email protected]> wrote: > You think you're confused. > > I did not change the community string, and it works from the CLI and/or > through the realtime plugin.� The device shows as UP, and I use "SNMP or > ping" as up/down detection. > > I also tried changing the SNMP timeout to 1000 ms.� All that did was > change the error log to this: > > 10/24/2014 11:29:22 AM - SPINE: Poller[0] Host[703] TH[1] DS[12223] > WARNING: SNMP timeout detected [1000 ms], ignoring host '10.13.114.254' > > I've tried "SNMP Uptime", "SNMP Desc", and "SNMP getNext" as well.� On > the Device Management screen, it retrieves the correct SNMP information.� > The only think that seems to not be working is the polling through spine. > > I'm curious why zeroing the serial number of a non-existent expansion unit > caused this problem. > > I've also rebooted the SiteMonitor at least a couple of times to no effect. > > My next thing will be to just replace the SiteMonitor with a spare.� > It's all the way down in town, so that is a half-day time hit. > > bp > > On 10/24/2014 11:16 AM, George Skorup (Cyber Broadcasting) via Af wrote: > > I am thoroughly confused. Is your community string correct? Can you > increase the device SNMP timeout, like 1000ms instead of 250ms. What's your > device down detection set to? Is it showing down in the device list? > > I have seen some base units go kinda screwy and respond slower and a > reboot doesn't fix it, they needed a power-cycle. > > On 10/24/2014 11:25 AM, Bill Prince via Af wrote: > > Now thrice. > > No joy in Mudville. > > bp > > On 10/24/2014 8:07 AM, Bill Prince via Af wrote: > > Yah.� Twice now. > > bp > > On 10/23/2014 11:06 PM, George Skorup (Cyber Broadcasting) via Af wrote: > > Gotta be the poller cache. Did you try a rebuild? > > On 10/23/2014 11:03 PM, Bill Prince via Af wrote: > > Getting closer.� When I look in the SNMP cache, there is no entry for > the device. > > Looking in the log (without debug), I get: > > 10/23/2014 08:34:25 PM - SPINE: Poller[0] Host[797 > <http://10.13.112.20/host.php?action=edit&id=797>] TH[1] DS[12316 > <http://10.13.112.20/data_sources.php?action=ds_edit&id=12316>] WARNING: > SNMP timeout detected [250 ms], ignoring host '10.13.114.254' > > So there is something causing the SNMP request to barf inside cacti.� > When I do an snmpget from the CLI, it all looks fine.� Likewise, the > realtime plugin is working fine too. > > So when realtime is doing the SNMP queries outside the poller, they are > fine.� Just when spine is doing the SNMP requests. > > > bp > > On 10/23/2014 4:12 PM, George Skorup (Cyber Broadcasting) via Af wrote: > > You divided by zero, didn't you? > > Are you sure your modules are in the same order as before? > > On 10/23/2014 1:29 PM, Bill Prince via Af wrote: > > > I noticed an "Expansion Unit" on one of my SiteMonitors this morning.� > It said something about "Device Removed" or something like that. > > Remembering the discussion the other day on this topic, I put a "0" in the > Serial # for the non-existent unit, rescanned, & rebooted. > > Now, none of the OIDs work in Cacti.� If I do a simple snmpget on any of > the OIDs that I use, the correct information comes back. Several of the > OIDs are on the base unit anyway, so they would not have moved, and > further, the OIDs don't reference the serial number. > > So... what did I do, and how do I fix it? > > > > > > > > > >
