You mean from the web GUI? Sure.
I presume a power cycle does something different from a reboot?
I was always curious about this particular SiteMonitor, as it came up
with the extra device on the expansion bus from the get-go. I'd never
worried about it, and then I saw the discussion about getting rid of old
devices with the zeroed-serial trick.
Don't go there! It's a trap!
bp
On 10/24/2014 2:52 PM, George Skorup (Cyber Broadcasting) via Af wrote:
Can you post a screenshot of your expansion, binary and analog tabs?
Also, I bet if you power-cycle it, it will be fine again. I was
working with Forrest on a bug where the SyncInjector and some other
newer modules would mysteriously disappear from the bus. He was able
to reproduce and get a fixed up firmware load for the modules.
Something about one thing booting up faster than another, or something
like that.
On 10/24/2014 4:41 PM, Bill Prince via Af wrote:
Gotcha!
I removed all the Data Sources except one (PWR1).� Suddenly that
data was making it into cacti.
Then I added back in all the Data Sources coming _JUST_ from the
SiteMonitor itself.� That also worked.
Then I added in one of the Data Sources from the SyncInjector (sync
events), which happens to be the only unit on the expansion bus past
where I removed the non-existent unit.� This broke it again.
So I have apparently uncovered a bug where removing a unit from the
expansion bus (by zeroing the serial number) that causes the
SiteMonitor to break SNMP responses.� I think it's probably just a
bad checksum, but I will leave that up to him.� I forwarded the
pcap trace to him.
I will probably also swap out the SiteMonitor that has the problem.
Thanks guys!
bp
On 10/24/2014 1:57 PM, Bill Prince via Af wrote:
Then again....
Not sure why I didn't notice this the first (or second) time.�
Wireshark is telling me I have a malformed packet; either a broken
header or bad checksum.� So even though the SNMP response is
coming in with the expected data, it's getting dropped before is
gets into cacti because of the malformed packet.
This would explain why removing a unit on the expansion bus changed
things...
bp
On 10/24/2014 1:32 PM, Bill Prince via Af wrote:
OK. Confirmed.� The SiteMonitor is getting the SNMP requests, and
it is responding with the expected values.
I ran a pcap trace both at the SiteMonitor as well as at the
ethernet port on the cacti server.� SNMP requests/responses are
going both ways (and at both ends). In fact, spine appears to be
doing 3 retries.
One thing I didn't expect is that just before the SNMP requests,
there are two attempts to open a telnet on the SiteMonitor.� Not
sure where that is coming from, except perhaps for the Manage
plugin (which I de-installed several weeks ago).
So something is broken inside cacti.� How/why this was caused by
zeroing a serial number from a non-existent expansion unit is
completely baffling to me.
I also have no clue how to fix it, because cacti "thinks" there was
no response.
bp
On 10/24/2014 11:16 AM, George Skorup (Cyber Broadcasting) via Af
wrote:
I am thoroughly confused. Is your community string correct? Can
you increase the device SNMP timeout, like 1000ms instead of
250ms. What's your device down detection set to? Is it showing
down in the device list?
I have seen some base units go kinda screwy and respond slower and
a reboot doesn't fix it, they needed a power-cycle.
On 10/24/2014 11:25 AM, Bill Prince via Af wrote:
Now thrice.
No joy in Mudville.
bp
On 10/24/2014 8:07 AM, Bill Prince via Af wrote:
Yah.� Twice now.
bp
On 10/23/2014 11:06 PM, George Skorup (Cyber Broadcasting) via
Af wrote:
Gotta be the poller cache. Did you try a rebuild?
On 10/23/2014 11:03 PM, Bill Prince via Af wrote:
Getting closer.� When I look in the SNMP cache, there is no
entry for the device.
Looking in the log (without debug), I get:
10/23/2014 08:34:25 PM - SPINE: Poller[0] Host[797
<http://10.13.112.20/host.php?action=edit&id=797>] TH[1]
DS[12316
<http://10.13.112.20/data_sources.php?action=ds_edit&id=12316>] WARNING:
SNMP timeout detected [250 ms], ignoring host '10.13.114.254'
So there is something causing the SNMP request to barf inside
cacti.� When I do an snmpget from the CLI, it all looks
fine.� Likewise, the realtime plugin is working fine too.
So when realtime is doing the SNMP queries outside the poller,
they are fine.� Just when spine is doing the SNMP requests.
bp
On 10/23/2014 4:12 PM, George Skorup (Cyber Broadcasting) via
Af wrote:
You divided by zero, didn't you?
Are you sure your modules are in the same order as before?
On 10/23/2014 1:29 PM, Bill Prince via Af wrote:
I noticed an "Expansion Unit" on one of my SiteMonitors this
morning.� It said something about "Device Removed" or
something like that.
Remembering the discussion the other day on this topic, I
put a "0" in the Serial # for the non-existent unit,
rescanned, & rebooted.
Now, none of the OIDs work in Cacti.� If I do a simple
snmpget on any of the OIDs that I use, the correct
information comes back. Several of the OIDs are on the base
unit anyway, so they would not have moved, and further, the
OIDs don't reference the serial number.
So... what did I do, and how do I fix it?