Send netdisco-users mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."
Today's Topics:

   1. Re: Arpwalk, Macsuck or Discover not successful on some
      devices (Tobias Gerlach)
--- Begin Message ---
After a full discovery run I can see in the postgres log tens of
thousands of messages like these:

2019-11-28 11:18:26.296 CET [42973] STATEMENT:  INSERT INTO
device_port_properties ( ifindex, ip, port, raw_speed) VALUES ( $1,
$2, $3, $4 )
2019-11-28 11:18:26.296 CET [42973] ERROR:  current transaction is
aborted, commands ignored until end of transaction block
...

What does these errors mean? Could this be related to the stucked jobs?


Am Mo., 25. Nov. 2019 um 17:05 Uhr schrieb Tobias Gerlach <[email protected]>:
>
> Hello,
>
> I start thinking whether the problem might not be an issue with the latest 
> 2.0044x'er versions of Netdisco, but could be an issue with my new RHEL 
> server. However I still have no clue what could the stuck jobs.
> Last week I completely uninstalled Netdisco and installed version 2.42.10 
> from middle of this year, where I'm convinced I never faced these issues 
> with. Also I deleted all data from any Netdisco tables, except the "users" 
> table, to make sure the database contains no old entries. Even today the 
> "admin" table contains jobs which have been entered more than three days ago 
> but haven't started yet (status is NULL). On the other hand there are a bunch 
> of jobs which finished successfully.
> On my old SLES server I used PostgreSQL v9.4 and on my current RHEL 7.6 
> server I'm running on PostgreSQL database server version 11.6. I've optimized 
> postgres.conf with values from pgtune based on my server environment. 
> Netdisco's deployment.yml I adopted from my old installation and haven't been 
> touched so far.
> Any ideas to narrow down what's going on and what could be next steps to 
> troubleshoot?
>
> Jobs:
> 3091997 2019-11-22 08:33:25.783195 NULL NULL 10.208.16.6 NULL arpnip NULL 
> queued netdisco 172.26.14.13 NULL NULL NULL
> 3091068 2019-11-22 08:33:25.783195 NULL NULL 10.213.89.205 NULL arpnip NULL 
> queued netdisco 172.26.14.13 NULL NULL NULL
> 3088950 2019-11-22 08:33:25.783195 NULL NULL 10.26.52.5 NULL arpnip NULL 
> queued netdisco 172.26.14.13 NULL NULL NULL
> ...
> 3093364 2019-11-22 08:33:25.783195 2019-11-22 13:15:38 2019-11-22 13:15:39 
> 149.223.209.234 NULL arpnip NULL done netdisco 172.26.14.13 Gathered arp 
> caches from 149.223.209.234 NULL NULL
> 3094549 2019-11-22 08:33:25.783195 2019-11-22 13:15:16 2019-11-22 13:15:27 
> 10.219.155.41 NULL arpnip NULL done netdisco 172.26.14.13 Gathered arp caches 
> from 10.219.155.41 NULL NULL
> 3089676 2019-11-22 08:33:25.783195 2019-11-22 13:15:22 2019-11-22 13:15:23 
> 10.166.0.58 NULL arpnip NULL done netdisco 172.26.14.13 Gathered arp caches 
> from 10.166.0.58 NULL NULL
>
> postgres.conf modified part:
> DB TUNE
> # DB Version: 11
> # OS Type: linux
> # DB Type: web
> # Total Memory (RAM): 16 GB
> # CPUs num: 4
> # Connections num: 100
> # Data Storage: ssd
>
> max_connections = 100
> shared_buffers = 4GB
> effective_cache_size = 12GB
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.7
> wal_buffers = 16MB
> default_statistics_target = 100
> random_page_cost = 1.1
> effective_io_concurrency = 200
> work_mem = 20971kB
> min_wal_size = 1GB
> max_wal_size = 2GB
> max_worker_processes = 4
> max_parallel_workers_per_gather = 2
> max_parallel_workers = 4
>
> Thanks,
> Tobias
>
>
> Am Mi., 20. Nov. 2019 um 11:43 Uhr schrieb Oliver Gorwits <[email protected]>:
>>
>> Hi Tobias
>>
>> Yes, deleting the perl5 directory is enough, and then when you reinstall you 
>> can pick a specific version by doing:
>>
>> curl -L https://cpanmin.us/ | perl - --notest --local-lib ~/perl5 
>> App::[email protected]
>>
>> (or whatever version you want)
>>
>> Please do let us know how you get on, as so far we've not been able to 
>> reproduce the issue which is quite frustrating
>>
>> regards
>> oliver.
>>
>> On Mon, 18 Nov 2019 at 09:04, Tobias Gerlach <[email protected]> wrote:
>>>
>>> Hello,
>>> unfortunately the problems still exist with the latest version and also all 
>>> workarounds don't work work for me. I'm seriously considering to downgrade 
>>> Netdisco as the issue effects a few thousand devices in my case.
>>> I think the problems didn't exist with version 2.043001 and started with 
>>> 2.044x but I'm no longer 100% sure. Can anyone confirm that please?
>>> Would it be enough to delete the whole ~/perl5 directory for a complete 
>>> deinstallation or should the database be deleted as well? Perhaps it is 
>>> sufficient to delete the contents of some tables?
>>> Thanks,
>>> Tobias
>>>
>>> Am Do., 7. Nov. 2019 um 19:19 Uhr schrieb Nick Nauwelaerts 
>>> <[email protected]>:
>>>>
>>>> (sourceforge held my mail for being oversize, so trimmed old replies a bit)
>>>>
>>>> kinda late to the party, tried to make sense of the thread via the 
>>>> mailinglist archive.
>>>>
>>>> "Happens in my case on a lot of devices, not just a few. The stuck jobs 
>>>> doesn't finish after even days. "
>>>>
>>>>
>>>> if you restart netdisco-backend, are your jobs then reported as complete 
>>>> in the poller performance report, but with very long runtimes?
>>>>
>>>>
>>>> i've been seeing something similar from time to time but can't find a way 
>>>> to reproduce it:
>>>> https://github.com/netdisco/netdisco/issues/466
>>>> jobs that ran for "1 day 01:29:30".
>>>>
>>>>
>>>> what i'm looking, which could be totally unrelated, is the timestamp 
>>>> precision, since some fields use high precision "2018-12-13 
>>>> 20:05:30.887272", some just up to the second: " 2018-12-13 20:05:30". this 
>>>> shouldn't be an issue if we compare using sql, but that's not always the 
>>>> case, for example in
>>>>
>>>> https://github.com/netdisco/netdisco/blob/92bc49f27444ff201cc2178132979e0d89c08850/lib/App/Netdisco/DB/Result/Admin.pm#L80-L82
>>>> we use "$args->{foreign_alias}.last_defer" => { '>', \'(LOCALTIMESTAMP - 
>>>> ?::interval)' },
>>>>
>>>>
>>>>
>>>> but in other parts 
>>>> https://github.com/netdisco/netdisco/blob/92bc49f27444ff201cc2178132979e0d89c08850/lib/App/Netdisco/DB/ResultSet/Admin.pm#L61-L63
>>>> we cast those timestamps:           entered_stamp => \"to_char(entered, 
>>>> 'YYYY-MM-DD HH24:MI')",
>>>>
>>>>
>>>> also think i saw 1 or 2 time comparisons done in perl.
>>>>
>>>>
>>>> and for some reason sometimes the high precision timer has a leading 0 for 
>>>> microseconds:
>>>> no leading 0 -> "35119"  ---  macsuck  | 2018-12-13 12:55:20.35119
>>>> leading 0 -> "058855 " ---- arpnip   | 2018-12-13 12:05:12.058855
>>>>
>>>>
>>>> // nick
>>>>
>>>> From: Pavel Skovajsa [mailto:[email protected]]
>>>> Sent: Monday, October 14, 2019 22:09
>>>> To: Oliver Gorwits <[email protected]>
>>>> Cc: Tobias Gerlach <[email protected]>; 
>>>> [email protected]
>>>> Subject: Re: [Netdisco] Arpwalk, Macsuck or Discover not successful on 
>>>> some devices
>>>>
>>>> bump
>>>>
>>>> On Mon, Oct 7, 2019 at 6:39 PM Pavel Skovajsa <[email protected]> 
>>>> wrote:
>>>> Hello,
>>>>
>>>> I think we have the same issue - the jobs haven't finished since the 
>>>> upgrade to 2.44.0. Last week I deleted all the job (using the web frontend 
>>>> trash bin icon). Poller performance only shows nbtstat job completed, 
>>>> nothing else is there since the update. The output for the slow queries 
>>>> and running queries is on https://pastebin.com/LPtEDD6h.
>>>>
>>>> On the other side, it looks like it is really working, because things are 
>>>> working based on the output below:
>>>> query
>>>> result
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '2 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '1 hours'));
>>>> 1809
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '3 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '2 hours'));
>>>> 1856
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '4 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '3 hours'));
>>>> 1815
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '5 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '4 hours'));
>>>> 1932
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '6 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '5 hours'));
>>>> 712
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '7 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '6 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '8 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '7 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '9 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '8 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '10 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '9 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_discover 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '11 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '10 hours'));
>>>> 0
>>>>
>>>>
>>>>
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '2 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '1 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '3 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '2 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '4 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '3 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '5 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '4 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '6 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '5 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '7 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '6 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '8 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '7 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '9 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '8 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '10 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '9 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '11 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '10 hours'));
>>>> 1066
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '12 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '11hours'));
>>>> 2021
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '13 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '12 hours'));
>>>> 1957
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '14 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '13 hours'));
>>>> 1913
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '15 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '14 hours'));
>>>> 1972
>>>> select count(name) from device where vendor = 'cisco' AND (last_macsuck 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '16 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '15 hours'));
>>>> 237
>>>>
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '2 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '1 hours'));
>>>> 2010
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '3 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '2 hours'));
>>>> 2271
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '4 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '3 hours'));
>>>> 1301
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '5 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '4 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '6 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '5 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '7 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '6 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '8 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '7 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '9 hours') AND  (CURRENT_TIMESTAMP - 
>>>> interval '8 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '10 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '9 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '11 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '10 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '12 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '11hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '13 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '12 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '14 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '13 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '15 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '14 hours'));
>>>> 0
>>>> select count(name) from device where vendor = 'cisco' AND (last_arpnip 
>>>> BETWEEN (CURRENT_TIMESTAMP - interval '16 hours') AND  (CURRENT_TIMESTAMP 
>>>> - interval '15 hours'));
>>>> 0
>>>>
>>>> netdisco=> select action,status,count (job) from admin group by 
>>>> action,status;
>>>>    action    |          status           | count
>>>> -------------+---------------------------+--------
>>>>  arpnip      | done                      |  51663
>>>>  arpnip      | queued                    |    248
>>>>  arpwalk     | done                      |      6
>>>>  discover    | done                      |  36793
>>>>  discover    | error                     |   6635
>>>>  discover    | info                      |      5
>>>>  discover    | queued                    |   4076
>>>>  discoverall | done                      |      3
>>>>  expire      | done                      |      4
>>>>  macsuck     | done                      |  46684
>>>>  macsuck     | queued                    |    299
>>>>  macsuck     | queued-mdnetdisco         |      2
>>>>  macwalk     | done                      |      4
>>>>  nbtstat     | done                      | 105197
>>>>  nbtstat     | queued                    |    303
>>>>  nbtwalk     | done                      |      9
>>>> (16 rows)
>>>>
>>>> -pavel
>>>>
>>>> On Sat, Oct 5, 2019 at 1:17 PM Oliver Gorwits <[email protected]> wrote:
>>>> It would be interesting to know what the backend server process table 
>>>> shows when the workers are not progressing. Output of ps should show what 
>>>> the workers are doing.
>>>>
>>>> Also, here is a way to show active queries against a database and at the 
>>>> same time I wonder if running this will show some blockage somewhere:
>>>>
>>>> -- show running queries (pre 9.2)
>>>> SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
>>>> FROM pg_stat_activity
>>>> WHERE current_query != '<IDLE>' AND current_query NOT ILIKE 
>>>> '%pg_stat_activity%'
>>>> ORDER BY query_start desc;
>>>>
>>>> -- show running queries (9.2)
>>>> SELECT pid, age(clock_timestamp(), query_start), usename, query
>>>> FROM pg_stat_activity
>>>> WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
>>>> ORDER BY query_start desc;
>>>>
>>>> -- long running (9.2)
>>>> SELECT now() - query_start as "runtime", usename, datname, waiting, state, 
>>>> query
>>>>   FROM  pg_stat_activity
>>>>   WHERE now() - query_start > '2 minutes'::interval
>>>>  ORDER BY runtime DESC;
>>>>
>>>> ________________________________
>>>>
>>>> Volg Aquafin op Facebook<https://www.facebook.com/AquafinNV> | 
>>>> Twitter<https://twitter.com/aquafinnv> | 
>>>> YouTube<http://www.youtube.com/channel/UCk_4P5BJ-MtEEDCkCsR_KqQ?feature=mhee>
>>>>  | LinkedIN<http://www.linkedin.com/company/aquafin/products> | 
>>>> Instagram<https://www.instagram.com/aquafin_nv/>
>>>>
>>>> In het kader van de uitoefening van onze taken verzamelen we bij Aquafin 
>>>> persoonsgegevens. Hoe we omgaan met deze gegevens en wat de rechten van de 
>>>> betrokkenen zijn, kan je nalezen in onze privacy 
>>>> policy<https://www.aquafin.be/nl-be/privacy-policy>.
>>>>
>>>>   P Denk aan het milieu. Druk deze mail niet onnodig af.



--- End Message ---
_______________________________________________
Netdisco mailing list - Digest Mode
[email protected]
https://lists.sourceforge.net/lists/listinfo/netdisco-users

Reply via email to