Good idea. Did the pcap trace, and it sure looks like the SiteMonitor
is responding with the correct values. So the question remains as to
why cacti thinks otherwise (problem is cacti, but I have no idea why).
Maybe need to trace it at the other end as well....
bp
On 10/24/2014 12:32 PM, Forrest Christian (List Account) via Af wrote:
Darn autocorrect. SNMP not snap.
On Oct 24, 2014 12:32 PM, "Forrest Christian (List Account)"
<[email protected] <mailto:[email protected]>> wrote:
Before you do that I'd look at what is coming back via snap via a
wireshark or similar.
If you zeroed an expansion module in the middle of the list, then
all of the oids for devices after that entry in the list would
have shifted to a lower number.
The sitemonitor assigns oids based on its knowledge of how many of
each i/o type each device takes. It remembers this even if the
device isn't attached anymore. By zeroing a device in the middle,
it reassigns oids after that point in the table, since it doesn't
have the zeroed device info as a placeholder.
On Oct 24, 2014 12:01 PM, "Bill Prince via Af" <[email protected]
<mailto:[email protected]>> wrote:
You think you're confused.
I did not change the community string, and it works from the
CLI and/or through the realtime plugin.� The device shows as
UP, and I use "SNMP or ping" as up/down detection.
I also tried changing the SNMP timeout to 1000 ms.� All that
did was change the error log to this:
10/24/2014 11:29:22 AM - SPINE: Poller[0] Host[703] TH[1]
DS[12223] WARNING: SNMP timeout detected [1000 ms], ignoring
host '10.13.114.254'
I've tried "SNMP Uptime", "SNMP Desc", and "SNMP getNext" as
well.� On the Device Management screen, it retrieves the
correct SNMP information.� The only think that seems to not
be working is the polling through spine.
I'm curious why zeroing the serial number of a non-existent
expansion unit caused this problem.
I've also rebooted the SiteMonitor at least a couple of times
to no effect.
My next thing will be to just replace the SiteMonitor with a
spare.� It's all the way down in town, so that is a half-day
time hit.
bp
On 10/24/2014 11:16 AM, George Skorup (Cyber Broadcasting) via
Af wrote:
I am thoroughly confused. Is your community string correct?
Can you increase the device SNMP timeout, like 1000ms instead
of 250ms. What's your device down detection set to? Is it
showing down in the device list?
I have seen some base units go kinda screwy and respond
slower and a reboot doesn't fix it, they needed a power-cycle.
On 10/24/2014 11:25 AM, Bill Prince via Af wrote:
Now thrice.
No joy in Mudville.
bp
On 10/24/2014 8:07 AM, Bill Prince via Af wrote:
Yah.� Twice now.
bp
On 10/23/2014 11:06 PM, George Skorup (Cyber Broadcasting)
via Af wrote:
Gotta be the poller cache. Did you try a rebuild?
On 10/23/2014 11:03 PM, Bill Prince via Af wrote:
Getting closer.� When I look in the SNMP cache, there
is no entry for the device.
Looking in the log (without debug), I get:
10/23/2014 08:34:25 PM - SPINE: Poller[0] Host[797
<http://10.13.112.20/host.php?action=edit&id=797>] TH[1]
DS[12316
<http://10.13.112.20/data_sources.php?action=ds_edit&id=12316>]
WARNING: SNMP timeout detected [250 ms], ignoring host
'10.13.114.254'
So there is something causing the SNMP request to barf
inside cacti.� When I do an snmpget from the CLI, it
all looks fine.� Likewise, the realtime plugin is
working fine too.
So when realtime is doing the SNMP queries outside the
poller, they are fine.� Just when spine is doing the
SNMP requests.
bp
On 10/23/2014 4:12 PM, George Skorup (Cyber Broadcasting)
via Af wrote:
You divided by zero, didn't you?
Are you sure your modules are in the same order as before?
On 10/23/2014 1:29 PM, Bill Prince via Af wrote:
I noticed an "Expansion Unit" on one of my SiteMonitors
this morning.� It said something about "Device
Removed" or something like that.
Remembering the discussion the other day on this topic,
I put a "0" in the Serial # for the non-existent unit,
rescanned, & rebooted.
Now, none of the OIDs work in Cacti.� If I do a
simple snmpget on any of the OIDs that I use, the
correct information comes back. Several of the OIDs are
on the base unit anyway, so they would not have moved,
and further, the OIDs don't reference the serial number.
So... what did I do, and how do I fix it?