netdisco-users Digest, Vol 116, Issue 17

netdisco-users-request Wed, 10 Feb 2016 04:39:08 -0800

Send netdisco-users mailing list submissions to
        [email protected]


To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."

Today's Topics:

   1. Re: Netdisco 2 - port name problem (Oliver Gorwits)

--- Begin Message ---

Hi Edward,

If it is not too late to try, when you run your discoverall could you take a 
look at the load on the network interface on your server or its local 
switchport? Also the config/speed for half-duplex issues, half dead media 
converters, etc. 

My thinking is that SNMP is UDP, and discoverall runs many discovers in 
parallel, so I wonder if there are dropped packets under high load. Just a 
guess!

It's possible to tune down the number of parallel workers via the configuration 
(default is twice number of CPUs on the server, I think).

Regards
Oliver. 

Sent from the moon.

> On 9 Feb 2016, at 22:42, Edward Vopata <[email protected]> wrote:
> 
> Okay.  I'm going to purge my data and setup to only add about 15 devices and 
> see what happens.  T
> This should make the log fairly reasonable.
> 
> I don't understand how my SNMP configuration could be causing issues:
>     - When I do a discover on a single device,  it works.
>     - The same snmp configuration is NOT causing an issue with the old 
> NetDisco 1 instance. 
>     -
> I have attached a sample of my snmp config for a device.
>  
> I have also attached the snmp config section from my deployment.yml
> 
> 
> -- Ed.
> 
> 
>> On 02/09/16 15:45, Steven Xu wrote:
>> Hi Edward,
>> 
>> I'm rather stumped. For all devices that Netdisco knows about, "discoverall" 
>> simply adds a discover job for those that aren't already in the queue. The 
>> single discover runs the discover job (same code) without adding it to the 
>> queue.
>> 
>> Maybe this is the interaction of several devices on your test subnet? Try 
>> limiting your discover to a single device. Configuration:
>> discover_only: <ip>
>> 
>> With a single device, your logs should be considerably smaller. You can 
>> further reduce the size by running
>> cat ~/logs/netdisco-daemon.log | grep -vE "(mgr|sched)"
>> 
>> I would also propose that it may have to do with the SNMP configuration on 
>> your devices since it seems no one else is having your problem. 
>> 
>> Steven
>> 
>> -----Edward Vopata <[email protected]> wrote: -----
>> To: Steven Xu <[email protected]>
>> From: Edward Vopata <[email protected]>
>> Date: 02/09/2016 11:44AM
>> Cc: <[email protected]>
>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>> 
>> 
>> I have been running this for a few days.
>> It looks like the problem is somewhere in the "discoverall" code.
>> 
>> I ran without the discoverall (with just macwalk and arpwalk scheduled) and  
>> the problem did NOT appear.
>> 
>> So, I enable the the discoverall schedule and the problem is back.
>> 
>> I then tried to limit the discover to a single subnet with "log: debug",
>> but the output is overwhelming (74 MB of logs covering 2 hours worth of 
>> runtime).
>> I think I need to be able to limit the logging to only log for a set of IP 
>> addresses.
>>      
>> 
>> Why does the single discover fix the problem (ie: netdisco-do -DIQS  
>> discover -d 10.255.32.73), 
>> but the discoverall schedule corrupt my data (sometimes)?
>> 
>> -- Ed.
>> 
>> 
>> 
>>> On 02/05/16 10:31, Steven Xu wrote:
>>> Hi Edward,
>>> 
>>> I noticed that your scheduling is a slightly different time than your 
>>> previous schedule. If you see no problems after running this discover, it 
>>> may have to do with the time. 
>>> 
>>> I recommend you set the log configuration to debug for this scheduled 
>>> discover. I recommend that, if you include your logs, you should put them 
>>> on pastie.org because they can get quite large.
>>> log: debug
>>> 
>>> Steven
>>> 
>>> -----Edward Vopata <[email protected]> wrote: -----
>>> To: Steven Xu <[email protected]>
>>> From: Edward Vopata <[email protected]>
>>> Date: 02/05/2016 10:15AM
>>> Cc: <[email protected]>
>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>> 
>>> Okay.  I have set "discover_only" to a single /24 subnet.
>>> 
>>> I have set the schedule as follows:
>>> 
>>>     schedule:
>>>       macwalk:
>>>         when:
>>>           min: 20
>>>       arpwalk:
>>>         when:
>>>           min: 50
>>>       expire:
>>>         when: '20 23 * * *'
>>> 
>>> 
>>> I force a discover of target list of devices that are exhibiting the 
>>> ifindex issue.
>>> 
>>> I ran the about schedule for a full day, and the ifIndex issue has NOT 
>>> reappeared.
>>> 
>>> So, I am enabling the discoverall scheduling:
>>> 
>>>     discoverall:
>>>         when: '1 9 * *
>>> 
>>> And we will see what happens.
>>> 
>>> -- Ed.
>>> 
>>> 
>>>> On 02/04/16 07:47, Steven Xu wrote:
>>>> Hi Edward,
>>>> 
>>>> My suggestion in the last email was to run a manual discover, disabling 
>>>> the scheduled jobs and checking the port names later in the day. This 
>>>> won't solve the problem, but at least we might be able to identify what 
>>>> causes it.
>>>> 
>>>> Whoops, I hadn't dawned on me that arpnip and macsuck would fail when the 
>>>> port names are wrong.
>>>> 
>>>> Steven
>>>> 
>>>> -----Edward Vopata <[email protected]>                           wrote: -----
>>>> To: Steven Xu <[email protected]>
>>>> From: Edward Vopata <[email protected]>
>>>> Date: 02/03/2016 05:24PM
>>>> Cc: <[email protected]>
>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>> 
>>>> 
>>>> 
>>>> > I mean, you should enable those "Display Columns" on the right sidebar 
>>>> > of the web interface when looking at ports for a device. 
>>>> 
>>>> This doesn't solve the problem of the port switching.   Also, once the 
>>>> port switches, updates (arp and mac) start failing because the
>>>> port does not match.
>>>> 
>>>> 
>>>> > If this machine is not in production, it's possible you could manually 
>>>> > run a discover, disable the daemon or limit them to a single subnet and 
>>>> > check back on the port names later. If > doing this stops the port names 
>>>> > changing to the numbers, then one of the scheduled jobs is incorrectly 
>>>> > changing your data.
>>>> 
>>>> The system is NOT production, I just need some hints on what to try.
>>>> 
>>>> > number of workers
>>>> 
>>>>    I have 2 x 8 core CPU's ==> 16 cores
>>>>    So, I'm at "4 * AUTO"? 
>>>> 
>>>>     I don't think I am stressing my system at this point.
>>>>     I am just trying to get it to work..
>>>> 
>>>> > There also isn't much difference between your version of netdisco and 
>>>> > the latest.
>>>> 
>>>> The App::Netdisco (version 2.033005) showed up today, so I installed it.
>>>> I would like to report that this version did NOT resolve the issue.
>>>> 
>>>> -- Ed.
>>>> 
>>>> 
>>>>> On 02/03/16 14:51, Steven Xu wrote:
>>>>> Hi Edward,
>>>>> 
>>>>> I mean, you should enable those "Display Columns" on the right sidebar of 
>>>>> the web interface when looking at ports for a device. 
>>>>> 
>>>>> If this machine is not in production, it's possible you could manually 
>>>>> run a discover, disable the daemon or limit them to a single subnet and 
>>>>> check back on the port names later. If doing this stops the port names 
>>>>> changing to the numbers, then one of the scheduled jobs is incorrectly 
>>>>> changing your data.
>>>>> 
>>>>> I can't exactly comment on the number of workers that you're using other 
>>>>> than we use "10 * AUTO" (10 workers * 4 cores). There also isn't much 
>>>>> difference between your version of netdisco and the latest.
>>>>> 
>>>>> Steven
>>>>> 
>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>> To: Steven Xu <[email protected]>
>>>>> From: Edward Vopata <[email protected]>
>>>>> Date: 02/03/2016 02:10PM
>>>>> Cc: <[email protected]>
>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>> 
>>>>> 
>>>>> > I recommend, when in the web UI, at least enabling the description or 
>>>>> > name fields to at least mitigate the problem if you haven't already.
>>>>> 
>>>>> I don't understand what you mean                                       
>>>>> here?
>>>>> 
>>>>> 
>>>>> > Are you running Netdisco V1 and V2 on the same server? Have you ensured 
>>>>> > they are properly set up separately from another?
>>>>> 
>>>>> No.  My NetDisco V1 is running on a different server with it's own 
>>>>> database.
>>>>> Both versions are running the same version of SNMP::Info.
>>>>> 
>>>>> 
>>>>> > How many workers have you set up?
>>>>>  
>>>>> workers:
>>>>>   task: 64
>>>>> 
>>>>> dns:
>>>>>   max_outstanding: 80
>>>>> 
>>>>> 
>>>>> FYI:  I upgrade to the latest netdisco 2 version:
>>>>> 
>>>>>     Hostname      : netdisco2
>>>>>     OS            : Ubuntu 14.04.3 LTS
>>>>>     Perl          : v5.18.2
>>>>>     App::NetDisco : 2.033005
>>>>>     DB Schema     : 40
>>>>>     SNMP::Info    : 3.31
>>>>>     Apache        : 2.4.7
>>>>>     Net-SNMP      : 5.7.2
>>>>>     PostgreSQL    : 9.3.10
>>>>> 
>>>>> 
>>>>> I am game to try anything.  I am using the netdisco 1 as my production 
>>>>> Netdisco.
>>>>> 
>>>>>      I suggest 
>>>>>         - totally disabling the nbtstats job.
>>>>>         - maybe disabling the macsuck job.
>>>>>         - leave the arpwalk job.  
>>>>>         - add  discover_only  and arpwalk_only to limit the discover to a 
>>>>> single subnet. 
>>>>> 
>>>>> 
>>>>> Thoughts
>>>>>      
>>>>> --- Ed.
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 02/03/16 11:48, Steven Xu wrote:
>>>>>> Hi Edward,
>>>>>> 
>>>>>> I recommend, when in the web UI, at least enabling the description or 
>>>>>> name fields to at least mitigate the problem if you haven't already.
>>>>>> 
>>>>>> Interesting observation: you only run discovers at 3AM, but you mention 
>>>>>> that within a few hours after the initial discovery (presumably during 
>>>>>> the workday), the port names change. 
>>>>>> 
>>>>>> Some (random) guesses:
>>>>>> Are you running Netdisco V1 and V2 on the same server? Have you          
>>>>>>                                ensured they are properly set up 
>>>>>> separately from another?
>>>>>> How many workers have you set up? If you set up too many, it could 
>>>>>> possibly cause SNMP timeouts as in the other thread?
>>>>>> 
>>>>>> Otherwise, somehow, the scheduled macsuck, arpnip and nbtstat jobs are 
>>>>>> affecting your device port entries?
>>>>>> 
>>>>>> Steven
>>>>>> 
>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>> To: Steven Xu <[email protected]>
>>>>>> From: Edward Vopata <[email protected]>
>>>>>> Date: 02/03/2016 10:03AM
>>>>>> Cc: <[email protected]>
>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>> 
>>>>>> 
>>>>>> Attached is the schedule section from my deployment.yml file.
>>>>>> 
>>>>>> I'm found the logs/netdisco-daemon.log, however I don't find anything 
>>>>>> about the job scheduling.
>>>>>> 
>>>>>> My Log level is set as:
>>>>>>     log: 'warning'
>>>>>> 
>>>>>> Now you are starting to understand my frustration, this is proving to be 
>>>>>> a very difficult problem.
>>>>>> 
>>>>>> 
>>>>>> Yes, there are other people having this problem.  There was a message 
>>>>>> thread back in November 2015 
>>>>>> "Subject: [Netdisco] Some Ports in Portview showing numbers instead of 
>>>>>> interface ID"
>>>>>> 
>>>>>> Thanks,
>>>>>> -- Ed.
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>>> On 02/03/16 08:25, Steven Xu wrote:
>>>>>>> Hi Edward,
>>>>>>> 
>>>>>>> Since you're observing that the names change after a couple hours, it 
>>>>>>> leads me to believe that some scheduled job is messing with your data. 
>>>>>>> Can you include your scheduled jobs configuration?
>>>>>>> 
>>>>>>> You can also check the status of your scheduled jobs by checking the 
>>>>>>> logs in logs/netdisco-daemon.log and searching for the last logs for a  
>>>>>>>                                                particular device.
>>>>>>> 
>>>>>>> I have a hard time imagining what could be causing this problem. When 
>>>>>>> you ran the discover manually, the results are fine, but when some 
>>>>>>> schedule job runs (presumably the scheduled discover), the device port 
>>>>>>> entry is getting the wrong name.
>>>>>>> 
>>>>>>> Maybe someone else has had a similar issue before?
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>> To: Steven Xu <[email protected]>, <[email protected]>
>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>> Date: 02/02/2016 01:00PM
>>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>>> 
>>>>>>> 
>>>>>>> I have about 4300 devices in my netdisco2 database, of which 2200 of 
>>>>>>> these devices are having this ifindex issue.
>>>>>>> 
>>>>>>> Attached is a list of device models that are having this issue.
>>>>>>> I am currently focusing on the cisco 2911 and 2811 devices.
>>>>>>> 
>>>>>>> Attached are some log & data                                            
>>>>>>>          files:
>>>>>>> 
>>>>>>>  
>>>>>>> Device-before-discover.txt    - this a query of a device before the 
>>>>>>> discovery, showing                                                     
>>>>>>> the problem
>>>>>>>                                                                         
>>>>>>>                                   (ie. device_port.port is the ifIndex 
>>>>>>> number)
>>>>>>> 
>>>>>>> Device-netdisco-discover.log  - this is the log of the netdisco 
>>>>>>> discover on that device.
>>>>>>> 
>>>>>>> Device-after-discover.txt    - this the same query after the discovery.
>>>>>>> 
>>>>>>> The view after the discover is how I would expect the device to look.
>>>>>>> However, after several hours, the device reverts back to the "before" 
>>>>>>> view,
>>>>>>> which is NOT right. 
>>>>>>> 
>>>>>>> This is a Netdisco 2 problem.  
>>>>>>>     I am running an old version of netdisco (based on the 0.96 release),
>>>>>>>     with the latest SNMP::Info (version 3.31) and I am not having the 
>>>>>>> problem.
>>>>>>> 
>>>>>>>     The same device on the old netdisco does NOT exhibit                
>>>>>>>                                      this problem.
>>>>>>> 
>>>>>>> I did intend to reply to the mailing list.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> - Ed Vopata
>>>>>>> 
>>>>>>> 
>>>>>>>> On 02/02/16 09:03, Steven Xu wrote:
>>>>>>>> Hi Ed,
>>>>>>>> 
>>>>>>>> I suspect the problem lies with the SNMP setup for your                
>>>>>>>>                                          routers; netdisco always uses 
>>>>>>>> the port name given by SNMP. The problem could also lie in the 
>>>>>>>> SNMP::Info implementation for your routers (which netdisco uses to 
>>>>>>>> gather information). What model are they?
>>>>>>>> 
>>>>>>>> Some logging code is already                                           
>>>>>>>>               present and may already be enough to diagnose your 
>>>>>>>> issue.  Run the script bin/netdisco-do to discover a                   
>>>>>>>>                                       device that has already been     
>>>>>>>>                                                     discover and is 
>>>>>>>> experiencing this problem, optionally with the -D and -S flags (try 
>>>>>>>> "bin/netdisco-do help" for more details). 
>>>>>>>> 
>>>>>>>> I think, even without the flags, warnings will be present if there is 
>>>>>>>> a problem with discovering your device. Do this after you notice that 
>>>>>>>> the port name gets changed to the IFindex value. Include the logs or 
>>>>>>>> let me know what you find. 
>>>>>>>> 
>>>>>>>> Also, I noticed that you didn't include the netdisco mailing list this 
>>>>>>>> time. Was this intentional? Others may have more insights into the 
>>>>>>>> problem that I don't.
>>>>>>>> 
>>>>>>>> Steven
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>>> To: Steven Xu <[email protected]>
>>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>>> Date: 02/02/2016 09:44AM
>>>>>>>> Subject: Re: [Netdisco] Netdisco 2 - port name problem
>>>>>>>> 
>>>>>>>> 
>>>>>>>> It is very difficult to isolate this problem.
>>>>>>>> 
>>>>>>>> An initial device discovery will set the port name to the correct 
>>>>>>>> value (ie GigabitEthernet0/0),
>>>>>>>> but sometime later the value gets changed to the IFindex value.  I 
>>>>>>>> have a small group of 
>>>>>>>> routers that are consistently exhibiting this problem, but I haven't 
>>>>>>>> been able to get much deeper.
>>>>>>>> 
>>>>>>>> Question:
>>>>>>>>     Where does the port name (device_port.port) value get set?
>>>>>>>>         - If it is a few places, then I can add some logging code to 
>>>>>>>> detect specific changes.
>>>>>>>>     Where is the source of the port name?
>>>>>>>>         - maybe I can add some checks there?
>>>>>>>> 
>>>>>>>> I still don't thing that the issue is a timeout issue.
>>>>>>>> 
>>>>>>>> -- Ed Vopata
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 02/01/16 07:09, Steven Xu wrote:
>>>>>>>>> Hi Edward,
>>>>>>>>> 
>>>>>>>>> It would help to provide some debug logs, although                    
>>>>>>>>>                                        I'm not entirely familiar with 
>>>>>>>>> which logs you should be providing.
>>>>>>>>> 
>>>>>>>>> Steven
>>>>>>>>> 
>>>>>>>>> -----Edward Vopata <[email protected]> wrote: -----
>>>>>>>>> To: <[email protected]>
>>>>>>>>> From: Edward Vopata <[email protected]>
>>>>>>>>> Date: 01/18/2016 02:05PM
>>>>>>>>> Subject: [Netdisco] Netdisco 2 - port name problem
>>>>>>>>> 
>>>>>>>>> I am still having the a problem with netdisco 2, where the port
>>>>>>>>> (device_port.port) is getting set to the SNMP ifIndex value of the 
>>>>>>>>> port
>>>>>>>>> instead of the port name value (ie: GigabitEthernet0/0).   I have 
>>>>>>>>> tried
>>>>>>>>> adjusting the SNMP retries and SNMP timeouts, but I am still getting 
>>>>>>>>> the
>>>>>>>>> same results.  An initial discover on the device will set the port to 
>>>>>>>>> the
>>>>>>>>> correct name value (ie GigabitEthernet0/0), however after a few hours 
>>>>>>>>> the
>>>>>>>>> port changes to the ifIndex value.
>>>>>>>>> 
>>>>>>>>> I don't believe that the problem is with the SNMP timeouts, since some
>>>>>>>>> of the devices having the problem are in the same data center and the
>>>>>>>>> netdisco server.
>>>>>>>>> 
>>>>>>>>> Here is my NetDisco Information:
>>>>>>>>>      Hostname      : netdisco2
>>>>>>>>>      OS            : Ubuntu 14.04.3 LTS
>>>>>>>>>      Perl          : v5.18.2
>>>>>>>>>      App::NetDisco : 2.033004
>>>>>>>>>      DB Schema     : 40
>>>>>>>>>      SNMP::Info    : 3.30
>>>>>>>>>      Apache        : 2.4.7
>>>>>>>>>      Net-SNMP      : 5.7.2
>>>>>>>>>      PostgreSQL    : 9.3.10
>>>>>>>>> 
>>>>>>>>> Please advise.
>>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> -- Ed.
>>>>>>>>> 
>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>>>>>>>>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>>>>>>>>> Monitor end-to-end web transactions and take corrective actions now
>>>>>>>>> Troubleshoot faster and improve end-user experience. Signup Now!
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
>>>>>>>>> _______________________________________________
>>>>>>>>> Netdisco mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/netdisco-users
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> [attachment "Netdisco-models.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-after-discover.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-before-discover.txt" removed by Steven Xu/fs/YorkU]
>>>>>>> [attachment "Device-netdisco-discover.log" removed by Steven 
>>>>>>> Xu/fs/YorkU]
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> [attachment "Netdisco-Schedule.txt" removed by Steven Xu/fs/YorkU]
> 
> <Device-SNMP-config.txt>
> <NetDisco-SNMP-config.txt>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Netdisco mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/netdisco-users

--- End Message ---

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

_______________________________________________
Netdisco mailing list - Digest Mode
[email protected]
https://lists.sourceforge.net/lists/listinfo/netdisco-users

netdisco-users Digest, Vol 116, Issue 17

Reply via email to