Send netdisco-users mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."
Today's Topics:
1. Re: Netdisco 2 - port name problem (Oliver Gorwits)
--- Begin Message ---
Hi Edward,
If it is not too late to try, when you run your discoverall could you take a
look at the load on the network interface on your server or its local
switchport? Also the config/speed for half-duplex issues, half dead media
converters, etc.
My thinking is that SNMP is UDP, and discoverall runs many discovers in
parallel, so I wonder if there are dropped packets under high load. Just a
guess!
It's possible to tune down the number of parallel workers via the configuration
(default is twice number of CPUs on the server, I think).
Regards
Oliver.
Sent from the moon.
> On 9 Feb 2016, at 22:42, Edward Vopata <[email protected]> wrote:
>
> Okay. I'm going to purge my data and setup to only add about 15 devices and
> see what happens. T
> This should make the log fairly reasonable.
>
> I don't understand how my SNMP configuration could be causing issues:
> - When I do a discover on a single device, it works.
> - The same snmp configuration is NOT causing an issue with the old
> NetDisco 1 instance.
> -
> I have attached a sample of my snmp config for a device.
>
> I have also attached the snmp config section from my deployment.yml
>
>
> -- Ed.
>
>
>> On 02/09/16 15:45, Steven Xu wrote:
>> Hi Edward,
>>
>> I'm rather stumped. For all devices that Netdisco knows about, "discoverall"
>> simply adds a discover job for those that aren't already in the queue. The
>> single discover runs the discover job (same code) without adding it to the
>> queue.
>>
>> Maybe this is the interaction of several devices on your test subnet? Try
>> limiting your discover to a single device. Configuration:
>> discover_only: <ip>
>>
>> With a single device, your logs should be considerably smaller. You can
>> further reduce the size by running
>> cat ~/logs/netdisco-daemon.log | grep -vE "(mgr|sched)"
>>
>> I would also propose that it may have to do with the SNMP configuration on
>> your devices since it seems no one else is having your problem.
>>
>> Steven
>>
>> -----Edward Vopata <[email protected]> wrote: -----
>> To: Steven Xu <[email protected]>
>> From: Edward Vopata <[email protected]>
>> Date: 02/09/2016 11:44AM
>> Cc: <[email protected]>
>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>
>>
>> I have been running this for a few days.
>> It looks like the problem is somewhere in the "discoverall" code.
>>
>> I ran without the discoverall (with just macwalk and arpwalk scheduled) and
>> the problem did NOT appear.
>>
>> So, I enable the the discoverall schedule and the problem is back.
>>
>> I then tried to limit the discover to a single subnet with "log: debug",
>> but the output is overwhelming (74 MB of logs covering 2 hours worth of
>> runtime).
>> I think I need to be able to limit the logging to only log for a set of IP
>> addresses.
>>
>>
>> Why does the single discover fix the problem (ie: netdisco-do -DIQS
>> discover -d 10.255.32.73),
>> but the discoverall schedule corrupt my data (sometimes)?
>>
>> -- Ed.
>>
>>
>>
>>> On 02/05/16 10:31, Steven Xu wrote:
>>> Hi Edward,
>>>
>>> I noticed that your scheduling is a slightly different time than your
>>> previous schedule. If you see no problems after running this discover, it
>>> may have to do with the time.
>>>
>>> I recommend you set the log configuration to debug for this scheduled
>>> discover. I recommend that, if you include your logs, you should put them
>>> on pastie.org because they can get quite large.
>>> log: debug
>>>
>>> Steven
>>>
>>> -----Edward Vopata <[email protected]> wrote: -----
>>> To: Steven Xu <[email protected]>
>>> From: Edward Vopata <[email protected]>
>>> Date: 02/05/2016 10:15AM
>>> Cc: <[email protected]>
>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>
>>> Okay. I have set "discover_only" to a single /24 subnet.
>>>
>>> I have set the schedule as follows:
>>>
>>> schedule:
>>> macwalk:
>>> when:
>>> min: 20
>>> arpwalk:
>>> when:
>>> min: 50
>>> expire:
>>> when: '20 23 * * *'
>>>
>>>
>>> I force a discover of target list of devices that are exhibiting the
>>> ifindex issue.
>>>
>>> I ran the about schedule for a full day, and the ifIndex issue has NOT
>>> reappeared.
>>>
>>> So, I am enabling the discoverall scheduling:
>>>
>>> discoverall:
>>> when: '1 9 * *
>>>
>>> And we will see what happens.
>>>
>>> -- Ed.
>>>
>>>
>>>> On 02/04/16 07:47, Steven Xu wrote:
>>>> Hi Edward,
>>>>
>>>> My suggestion in the last email was to run a manual discover, disabling
>>>> the scheduled jobs and checking the port names later in the day. This
>>>> won't solve the problem, but at least we might be able to identify what
>>>> causes it.
>>>>
>>>> Whoops, I hadn't dawned on me that arpnip and macsuck would fail when the
>>>> port names are wrong.
>>>>
>>>> Steven
>>>>
>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>> To: Steven Xu <[email protected]>
>>>> From: Edward Vopata <[email protected]>
>>>> Date: 02/03/2016 05:24PM
>>>> Cc: <[email protected]>
>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>
>>>>
>>>>
>>>> > I mean, you should enable those "Display Columns" on the right sidebar
>>>> > of the web interface when looking at ports for a device.
>>>>
>>>> This doesn't solve the problem of the port switching. Also, once the
>>>> port switches, updates (arp and mac) start failing because the
>>>> port does not match.
>>>>
>>>>
>>>> > If this machine is not in production, it's possible you could manually
>>>> > run a discover, disable the daemon or limit them to a single subnet and
>>>> > check back on the port names later. If > doing this stops the port names
>>>> > changing to the numbers, then one of the scheduled jobs is incorrectly
>>>> > changing your data.
>>>>
>>>> The system is NOT production, I just need some hints on what to try.
>>>>
>>>> > number of workers
>>>>
>>>> I have 2 x 8 core CPU's ==> 16 cores
>>>> So, I'm at "4 * AUTO"?
>>>>
>>>> I don't think I am stressing my system at this point.
>>>> I am just trying to get it to work..
>>>>
>>>> > There also isn't much difference between your version of netdisco and
>>>> > the latest.
>>>>
>>>> The App::Netdisco (version 2.033005) showed up today, so I installed it.
>>>> I would like to report that this version did NOT resolve the issue.
>>>>
>>>> -- Ed.
>>>>
>>>>
>>>>> On 02/03/16 14:51, Steven Xu wrote:
>>>>> Hi Edward,
>>>>>
>>>>> I mean, you should enable those "Display Columns" on the right sidebar of
>>>>> the web interface when looking at ports for a device.
>>>>>
>>>>> If this machine is not in production, it's possible you could manually
>>>>> run a discover, disable the daemon or limit them to a single subnet and
>>>>> check back on the port names later. If doing this stops the port names
>>>>> changing to the numbers, then one of the scheduled jobs is incorrectly
>>>>> changing your data.
>>>>>
>>>>> I can't exactly comment on the number of workers that you're using other
>>>>> than we use "10 * AUTO" (10 workers * 4 cores). There also isn't much
>>>>> difference between your version of netdisco and the latest.
>>>>>
>>>>> Steven
>>>>>
>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>> To: Steven Xu <[email protected]>
>>>>> From: Edward Vopata <[email protected]>
>>>>> Date: 02/03/2016 02:10PM
>>>>> Cc: <[email protected]>
>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>
>>>>>
>>>>> > I recommend, when in the web UI, at least enabling the description or
>>>>> > name fields to at least mitigate the problem if you haven't already.
>>>>>
>>>>> I don't understand what you mean
>>>>> here?
>>>>>
>>>>>
>>>>> > Are you running Netdisco V1 and V2 on the same server? Have you ensured
>>>>> > they are properly set up separately from another?
>>>>>
>>>>> No. My NetDisco V1 is running on a different server with it's own
>>>>> database.
>>>>> Both versions are running the same version of SNMP::Info.
>>>>>
>>>>>
>>>>> > How many workers have you set up?
>>>>>
>>>>> workers:
>>>>> task: 64
>>>>>
>>>>> dns:
>>>>> max_outstanding: 80
>>>>>
>>>>>
>>>>> FYI: I upgrade to the latest netdisco 2 version:
>>>>>
>>>>> Hostname : netdisco2
>>>>> OS : Ubuntu 14.04.3 LTS
>>>>> Perl : v5.18.2
>>>>> App::NetDisco : 2.033005
>>>>> DB Schema : 40
>>>>> SNMP::Info : 3.31
>>>>> Apache : 2.4.7
>>>>> Net-SNMP : 5.7.2
>>>>> PostgreSQL : 9.3.10
>>>>>
>>>>>
>>>>> I am game to try anything. I am using the netdisco 1 as my production
>>>>> Netdisco.
>>>>>
>>>>> I suggest
>>>>> - totally disabling the nbtstats job.
>>>>> - maybe disabling the macsuck job.
>>>>> - leave the arpwalk job.
>>>>> - add discover_only and arpwalk_only to limit the discover to a
>>>>> single subnet.
>>>>>
>>>>>
>>>>> Thoughts
>>>>>
>>>>> --- Ed.
>>>>>
>>>>>
>>>>>
>>>>>> On 02/03/16 11:48, Steven Xu wrote:
>>>>>> Hi Edward,
>>>>>>
>>>>>> I recommend, when in the web UI, at least enabling the description or
>>>>>> name fields to at least mitigate the problem if you haven't already.
>>>>>>
>>>>>> Interesting observation: you only run discovers at 3AM, but you mention
>>>>>> that within a few hours after the initial discovery (presumably during
>>>>>> the workday), the port names change.
>>>>>>
>>>>>> Some (random) guesses:
>>>>>> Are you running Netdisco V1 and V2 on the same server? Have you
>>>>>> ensured they are properly set up
>>>>>> separately from another?
>>>>>> How many workers have you set up? If you set up too many, it could
>>>>>> possibly cause SNMP timeouts as in the other thread?
>>>>>>
>>>>>> Otherwise, somehow, the scheduled macsuck, arpnip and nbtstat jobs are
>>>>>> affecting your device port entries?
>>>>>>
>>>>>> Steven
>>>>>>
>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>> To: Steven Xu <[email protected]>
>>>>>> From: Edward Vopata <[email protected]>
>>>>>> Date: 02/03/2016 10:03AM
>>>>>> Cc: <[email protected]>
>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>>
>>>>>>
>>>>>> Attached is the schedule section from my deployment.yml file.
>>>>>>
>>>>>> I'm found the logs/netdisco-daemon.log, however I don't find anything
>>>>>> about the job scheduling.
>>>>>>
>>>>>> My Log level is set as:
>>>>>> log: 'warning'
>>>>>>
>>>>>> Now you are starting to understand my frustration, this is proving to be
>>>>>> a very difficult problem.
>>>>>>
>>>>>>
>>>>>> Yes, there are other people having this problem. There was a message
>>>>>> thread back in November 2015
>>>>>> "Subject: [Netdisco] Some Ports in Portview showing numbers instead of
>>>>>> interface ID"
>>>>>>
>>>>>> Thanks,
>>>>>> -- Ed.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 02/03/16 08:25, Steven Xu wrote:
>>>>>>> Hi Edward,
>>>>>>>
>>>>>>> Since you're observing that the names change after a couple hours, it
>>>>>>> leads me to believe that some scheduled job is messing with your data.
>>>>>>> Can you include your scheduled jobs configuration?
>>>>>>>
>>>>>>> You can also check the status of your scheduled jobs by checking the
>>>>>>> logs in logs/netdisco-daemon.log and searching for the last logs for a
>>>>>>> particular device.
>>>>>>>
>>>>>>> I have a hard time imagining what could be causing this problem. When
>>>>>>> you ran the discover manually, the results are fine, but when some
>>>>>>> schedule job runs (presumably the scheduled discover), the device port
>>>>>>> entry is getting the wrong name.
>>>>>>>
>>>>>>> Maybe someone else has had a similar issue before?
>>>>>>>
>>>>>>> Steven
>>>>>>>
>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>> To: Steven Xu <[email protected]>, <[email protected]>
>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>> Date: 02/02/2016 01:00PM
>>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>>>
>>>>>>>
>>>>>>> I have about 4300 devices in my netdisco2 database, of which 2200 of
>>>>>>> these devices are having this ifindex issue.
>>>>>>>
>>>>>>> Attached is a list of device models that are having this issue.
>>>>>>> I am currently focusing on the cisco 2911 and 2811 devices.
>>>>>>>
>>>>>>> Attached are some log & data
>>>>>>> files:
>>>>>>>
>>>>>>>
>>>>>>> Device-before-discover.txt - this a query of a device before the
>>>>>>> discovery, showing
>>>>>>> the problem
>>>>>>>
>>>>>>> (ie. device_port.port is the ifIndex
>>>>>>> number)
>>>>>>>
>>>>>>> Device-netdisco-discover.log - this is the log of the netdisco
>>>>>>> discover on that device.
>>>>>>>
>>>>>>> Device-after-discover.txt - this the same query after the discovery.
>>>>>>>
>>>>>>> The view after the discover is how I would expect the device to look.
>>>>>>> However, after several hours, the device reverts back to the "before"
>>>>>>> view,
>>>>>>> which is NOT right.
>>>>>>>
>>>>>>> This is a Netdisco 2 problem.
>>>>>>> I am running an old version of netdisco (based on the 0.96 release),
>>>>>>> with the latest SNMP::Info (version 3.31) and I am not having the
>>>>>>> problem.
>>>>>>>
>>>>>>> The same device on the old netdisco does NOT exhibit
>>>>>>> this problem.
>>>>>>>
>>>>>>> I did intend to reply to the mailing list.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> - Ed Vopata
>>>>>>>
>>>>>>>
>>>>>>>> On 02/02/16 09:03, Steven Xu wrote:
>>>>>>>> Hi Ed,
>>>>>>>>
>>>>>>>> I suspect the problem lies with the SNMP setup for your
>>>>>>>> routers; netdisco always uses
>>>>>>>> the port name given by SNMP. The problem could also lie in the
>>>>>>>> SNMP::Info implementation for your routers (which netdisco uses to
>>>>>>>> gather information). What model are they?
>>>>>>>>
>>>>>>>> Some logging code is already
>>>>>>>> present and may already be enough to diagnose your
>>>>>>>> issue. Run the script bin/netdisco-do to discover a
>>>>>>>> device that has already been
>>>>>>>> discover and is
>>>>>>>> experiencing this problem, optionally with the -D and -S flags (try
>>>>>>>> "bin/netdisco-do help" for more details).
>>>>>>>>
>>>>>>>> I think, even without the flags, warnings will be present if there is
>>>>>>>> a problem with discovering your device. Do this after you notice that
>>>>>>>> the port name gets changed to the IFindex value. Include the logs or
>>>>>>>> let me know what you find.
>>>>>>>>
>>>>>>>> Also, I noticed that you didn't include the netdisco mailing list this
>>>>>>>> time. Was this intentional? Others may have more insights into the
>>>>>>>> problem that I don't.
>>>>>>>>
>>>>>>>> Steven
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>>> To: Steven Xu <[email protected]>
>>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>>> Date: 02/02/2016 09:44AM
>>>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>>>>
>>>>>>>>
>>>>>>>> It is very difficult to isolate this problem.
>>>>>>>>
>>>>>>>> An initial device discovery will set the port name to the correct
>>>>>>>> value (ie GigabitEthernet0/0),
>>>>>>>> but sometime later the value gets changed to the IFindex value. I
>>>>>>>> have a small group of
>>>>>>>> routers that are consistently exhibiting this problem, but I haven't
>>>>>>>> been able to get much deeper.
>>>>>>>>
>>>>>>>> Question:
>>>>>>>> Where does the port name (device_port.port) value get set?
>>>>>>>> - If it is a few places, then I can add some logging code to
>>>>>>>> detect specific changes.
>>>>>>>> Where is the source of the port name?
>>>>>>>> - maybe I can add some checks there?
>>>>>>>>
>>>>>>>> I still don't thing that the issue is a timeout issue.
>>>>>>>>
>>>>>>>> -- Ed Vopata
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 02/01/16 07:09, Steven Xu wrote:
>>>>>>>>> Hi Edward,
>>>>>>>>>
>>>>>>>>> It would help to provide some debug logs, although
>>>>>>>>> I'm not entirely familiar with
>>>>>>>>> which logs you should be providing.
>>>>>>>>>
>>>>>>>>> Steven
>>>>>>>>>
>>>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>>>> To: <[email protected]>
>>>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>>>> Date: 01/18/2016 02:05PM
>>>>>>>>> Subject: [Netdisco] Netdisco 2 - port name problem
>>>>>>>>>
>>>>>>>>> I am still having the a problem with netdisco 2, where the port
>>>>>>>>> (device_port.port) is getting set to the SNMP ifIndex value of the
>>>>>>>>> port
>>>>>>>>> instead of the port name value (ie: GigabitEthernet0/0). I have
>>>>>>>>> tried
>>>>>>>>> adjusting the SNMP retries and SNMP timeouts, but I am still getting
>>>>>>>>> the
>>>>>>>>> same results. An initial discover on the device will set the port to
>>>>>>>>> the
>>>>>>>>> correct name value (ie GigabitEthernet0/0), however after a few hours
>>>>>>>>> the
>>>>>>>>> port changes to the ifIndex value.
>>>>>>>>>
>>>>>>>>> I don't believe that the problem is with the SNMP timeouts, since some
>>>>>>>>> of the devices having the problem are in the same data center and the
>>>>>>>>> netdisco server.
>>>>>>>>>
>>>>>>>>> Here is my NetDisco Information:
>>>>>>>>> Hostname : netdisco2
>>>>>>>>> OS : Ubuntu 14.04.3 LTS
>>>>>>>>> Perl : v5.18.2
>>>>>>>>> App::NetDisco : 2.033004
>>>>>>>>> DB Schema : 40
>>>>>>>>> SNMP::Info : 3.30
>>>>>>>>> Apache : 2.4.7
>>>>>>>>> Net-SNMP : 5.7.2
>>>>>>>>> PostgreSQL : 9.3.10
>>>>>>>>>
>>>>>>>>> Please advise.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> -- Ed.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>>>>>>>> _______________________________________________
>>>>>>>>> Netdisco mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/netdisco-users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [attachment "Netdisco-models.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-after-discover.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-before-discover.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-netdisco-discover.log" removed by Steven
>>>>>>> Xu/fs/YorkU]
>>>>>>
>>>>>>
>>>>>>
>>>>>> [attachment "Netdisco-Schedule.txt" removed by Steven Xu/fs/YorkU]
>
> <Device-SNMP-config.txt>
> <NetDisco-SNMP-config.txt>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Netdisco mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/netdisco-users
--- End Message ---
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Netdisco mailing list - Digest Mode
[email protected]
https://lists.sourceforge.net/lists/listinfo/netdisco-users