Most people end up with a set of three or four configurations.  Ie
sitemonitor plus a injector is one configuration,  a sitemonitor by itself
is another one.

If you put the modules you don't ever monitor at the end of the list then
you can reuse configurations. Ie, a sitemonitor and syncinjector is the
same as a sitemonitor, syncinjector, and Poe as far as monitoring goes.
On Oct 25, 2014 1:06 PM, "Bill Prince via Af" <[email protected]> wrote:

>  OK.  I think I have an approach. The SiteMonitor plus all its expansion
> units is not the "device".
>
> The "device" is the SiteMonitor plus the index of the expansion unit.
>
> For example:
>
>    - SiteMonitor, index 0 is the SiteMonitor device
>    - SiteMonitor, index 1 is the 4-port POE device
>    - SiteMonitor, index 2 is the SyncInjector (first instance)
>     - SiteMonitor, index 3 is the SyncInjector (second instance)
>
> and so on.
>
> So when you add a SiteMonitor, you just add the SiteMonitor. If you add
> another Packetflux expansion unit, you have to add it knowing which index
> (AKA "slot") it is.  Put the device in a different position, and you need
> to update the index.
>
> bp
>
> On 10/25/2014 10:52 AM, Bill Prince via Af wrote:
>
> Yah.  Except that the index moves around, depending on what's in front of
> it (e.g. 4-port POE versus an 8-port POE).  So I can't depend on what index
> number I'll be using at any given installation.  The index name will have
> to stay static if I ever hope to find it.  Then again, if I install two of
> anything, there will be more than one index with the same description.
>
> Hmmm.  How to do this.   Maybe I do have to give each device a unique
> description, and then teach cacti to index on the unique description?
>
> bp
>
> On 10/25/2014 10:16 AM, Forrest Christian (List Account) via Af wrote:
>
> They should be offset by a fixed amount.  Ie subtract 4
> On Oct 25, 2014 10:58 AM, "Bill Prince via Af" <[email protected]> wrote:
>
>>  I think that may be it.  The OID I was using is no longer valid.  So
>> the SNMP response that came back had numbers in it, but it also looks like
>> the checksum was broken.
>>
>> Not clear to me why I thought I could do this without doing the index
>> thing.
>>
>> I hate doing the index thing.
>>
>> bp
>>
>> On 10/24/2014 10:32 PM, Forrest Christian (List Account) via Af wrote:
>>
>> A power cycle and a reboot should be identical in almost every case.  The
>> reboot actually triggers a hardware reset internally in the processor,
>> which should clear everything out.  Of course as soon as I say that it is
>> identical, someone will find an example where it is not.
>>
>> I'm not where I can look at the trace you sent, but I'm surprised it
>> contains errors.  I do know that the unit will return a response which may
>> look like this if the oid is invalid.
>>
>> Did you adjust your oids in cacti after the removal of the mystery
>> expansion unit from the table?  If not, this is likely the problem.
>>
>> In regards to the unit being there grin the factory..  My guess is if you
>> had this unit listed in there from the get go, then it probably was the
>> expansion unit we use to test the expansion bus here.  It's supposed to be
>> factory reset before shipping but it would not shock me if it wasn't.   We
>> actually had a short period that a largish percentage went out not factory
>> reset due to a tester software issue.   Not really a problem but we hate to
>> have them go out in any other state.
>> On Oct 24, 2014 5:08 PM, "Bill Prince via Af" <[email protected]> wrote:
>>
>>>  You mean from the web GUI?� Sure.
>>>
>>> I presume a power cycle does something different from a reboot?
>>>
>>> I was always curious about this particular SiteMonitor, as it came up
>>> with the extra device on the expansion bus from the get-go.� I'd never
>>> worried about it, and then I saw the discussion about getting rid of old
>>> devices with the zeroed-serial trick.
>>>
>>> Don't go there!� It's a trap!
>>>
>>> bp
>>>
>>> On 10/24/2014 2:52 PM, George Skorup (Cyber Broadcasting) via Af wrote:
>>>
>>> Can you post a screenshot of your expansion, binary and analog tabs?
>>>
>>> Also, I bet if you power-cycle it, it will be fine again. I was working
>>> with Forrest on a bug where the SyncInjector and some other newer modules
>>> would mysteriously disappear from the bus. He was able to reproduce and get
>>> a fixed up firmware load for the modules. Something about one thing booting
>>> up faster than another, or something like that.
>>>
>>> On 10/24/2014 4:41 PM, Bill Prince via Af wrote:
>>>
>>> Gotcha!
>>>
>>> I removed all the Data Sources except one (PWR1).� Suddenly that data
>>> was making it into cacti.
>>>
>>> Then I added back in all the Data Sources coming _JUST_ from the
>>> SiteMonitor itself.� That also worked.
>>>
>>> Then I added in one of the Data Sources from the SyncInjector (sync
>>> events), which happens to be the only unit on the expansion bus past where
>>> I removed the non-existent unit.� This broke it again.
>>>
>>> So I have apparently uncovered a bug where removing a unit from the
>>> expansion bus (by zeroing the serial number) that causes the SiteMonitor to
>>> break SNMP responses.� I think it's probably just a bad checksum, but I
>>> will leave that up to him.� I forwarded the pcap trace to him.
>>>
>>> I will probably also swap out the SiteMonitor that has the problem.
>>>
>>> Thanks guys!
>>>
>>> bp
>>>
>>> On 10/24/2014 1:57 PM, Bill Prince via Af wrote:
>>>
>>> Then again....
>>>
>>> Not sure why I didn't notice this the first (or second) time.�
>>> Wireshark is telling me I have a malformed packet; either a broken header
>>> or bad checksum.� So even though the SNMP response is coming in with the
>>> expected data, it's getting dropped before is gets into cacti because of
>>> the malformed packet.
>>>
>>> This would explain why removing a unit on the expansion bus changed
>>> things...
>>>
>>> bp
>>>
>>>
>>>
>>>
>>> On 10/24/2014 1:32 PM, Bill Prince via Af wrote:
>>>
>>> OK. Confirmed.� The SiteMonitor is getting the SNMP requests, and it
>>> is responding with the expected values.
>>>
>>> I ran a pcap trace both at the SiteMonitor as well as at the ethernet
>>> port on the cacti server.� SNMP requests/responses are going both ways
>>> (and at both ends). In fact, spine appears to be doing 3 retries.
>>>
>>> One thing I didn't expect is that just before the SNMP requests, there
>>> are two attempts to open a telnet on the SiteMonitor.� Not sure where
>>> that is coming from, except perhaps for the Manage plugin (which I
>>> de-installed several weeks ago).
>>>
>>> So something is broken inside cacti.� How/why this was caused by
>>> zeroing a serial number from a non-existent expansion unit is completely
>>> baffling to me.
>>>
>>> I also have no clue how to fix it, because cacti "thinks" there was no
>>> response.
>>>
>>> bp
>>>
>>> On 10/24/2014 11:16 AM, George Skorup (Cyber Broadcasting) via Af wrote:
>>>
>>> I am thoroughly confused. Is your community string correct? Can you
>>> increase the device SNMP timeout, like 1000ms instead of 250ms. What's your
>>> device down detection set to? Is it showing down in the device list?
>>>
>>> I have seen some base units go kinda screwy and respond slower and a
>>> reboot doesn't fix it, they needed a power-cycle.
>>>
>>> On 10/24/2014 11:25 AM, Bill Prince via Af wrote:
>>>
>>> Now thrice.
>>>
>>> No joy in Mudville.
>>>
>>> bp
>>>
>>> On 10/24/2014 8:07 AM, Bill Prince via Af wrote:
>>>
>>> Yah.� Twice now.
>>>
>>> bp
>>>
>>> On 10/23/2014 11:06 PM, George Skorup (Cyber Broadcasting) via Af wrote:
>>>
>>> Gotta be the poller cache. Did you try a rebuild?
>>>
>>> On 10/23/2014 11:03 PM, Bill Prince via Af wrote:
>>>
>>> Getting closer.� When I look in the SNMP cache, there is no entry for
>>> the device.
>>>
>>> Looking in the log (without debug), I get:
>>>
>>> 10/23/2014 08:34:25 PM - SPINE: Poller[0] Host[797
>>> <http://10.13.112.20/host.php?action=edit&id=797>] TH[1] DS[12316
>>> <http://10.13.112.20/data_sources.php?action=ds_edit&id=12316>]
>>> WARNING: SNMP timeout detected [250 ms], ignoring host '10.13.114.254'
>>>
>>> So there is something causing the SNMP request to barf inside cacti.�
>>> When I do an snmpget from the CLI, it all looks fine.� Likewise, the
>>> realtime plugin is working fine too.
>>>
>>> So when realtime is doing the SNMP queries outside the poller, they are
>>> fine.� Just when spine is doing the SNMP requests.
>>>
>>>
>>> bp
>>>
>>> On 10/23/2014 4:12 PM, George Skorup (Cyber Broadcasting) via Af wrote:
>>>
>>> You divided by zero, didn't you?
>>>
>>> Are you sure your modules are in the same order as before?
>>>
>>> On 10/23/2014 1:29 PM, Bill Prince via Af wrote:
>>>
>>>
>>> I noticed an "Expansion Unit" on one of my SiteMonitors this morning.�
>>> It said something about "Device Removed" or something like that.
>>>
>>> Remembering the discussion the other day on this topic, I put a "0" in
>>> the Serial # for the non-existent unit, rescanned, & rebooted.
>>>
>>> Now, none of the OIDs work in Cacti.� If I do a simple snmpget on any
>>> of the OIDs that I use, the correct information comes back. Several of the
>>> OIDs are on the base unit anyway, so they would not have moved, and
>>> further, the OIDs don't reference the serial number.
>>>
>>> So... what did I do, and how do I fix it?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>
>

Reply via email to