Send netdisco-users mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/netdisco-users
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of netdisco-users digest..."
Today's Topics:

   1. Re: scheduled jobs stop (Nick Nauwelaerts)
--- Begin Message ---
i've seen similar, but not exactly the same, issues when installing the same 
perl module via the system package manager & via cpanm and/or in combination 
with env PATH ordering or just having forgotten to use localenv/cpanm in 1 step 
of the process.

i'll try to break it down into parts that make sense. all of these can be the 
root or part of your problem (or completely unrelated ;)  )
rated from most likely (aka, i also ran into these at some points, sometimes 
more as once) to more exotic

*** somehow variables from another shell/user/whatever got netdisco's 
environment
we suggest running netdisco under it's own user and with a minimal shell 
profile to not have strange interactions with other settings. also when 
changing to the netdisco user make sure to use "su  - whateveruser", the dash 
is critical since it will spawn a login shell and thus give you a clean slate. 
(if you need to use sudo then "sudo su - whateveruser", not elegant but works.)

*** used different users during install
whatever you do, don't try to install stuff for netdisco as the root user or 
any other user. in the best case this will result in bad permissions on files 
with strange errors, in the worst case this can install stuff under different 
locations and good look figuring out what's used where.
            guideline:
            * this should be done as root: 
https://metacpan.org/pod/App::Netdisco#Dependencies
             * from this part you should only be doing things as the netdisco 
user: https://metacpan.org/pod/App::Netdisco#Installation


*** something went wrong with localenv or your path during the install
either use the ful path to the netdisco commands and cpanm, or make sure you 
add the path where they're in at the start of your PATH env. using the 
documented install method local::lib should get installed to keep things clean. 
don't forget to run perl scripts with localenv! see next point for a crappy 
workaround.

*** did something outside of the local::lib environment
so you changed to the netdisco user , but forgot to add localenv before your 
command. often seen when running cpanm, since most operating systems provide 
their own version via the package manager. i removed the system cpanm on my 
systems, but before i just added the following to my .zshrc or .bashrc:

~/bin/localenv bash

(or zsh. i use zsh for my regular user & bash for netdisco for some reason).
it's not elegant but since we recommend using a seperate user it shouldn't 
matter that much. this way your env gets set so local::lib can do it's magic

regular env vars for my netdisco test account:

testdisc@linux002:~> env | grep -i perl
MANPATH=/home/testdisc/perl5/man:/usr/local/man:/usr/share/man
(and even that isn't needed for netdisco , just for my convenience)

env vars when using localenv:
testdisc@linux002:~> env | grep -i perl
PERL_MB_OPT=--install_base "/home/testdisc/perl5"
PERL_MM_OPT=INSTALL_BASE=/home/testdisc/perl5
PERL_LOCAL_LIB_ROOT=/home/testdisc/perl5
PERL5LIB=/home/testdisc/perl5/lib/perl5
MANPATH=/home/testdisc/perl5/man:/home/testdisc/perl5/man:/usr/local/man:/usr/share/man
PATH=/home/testdisc/perl5/bin:/home/testdisc/bin:/usr/local/bin:/usr/bin:/bin

-> bottomline: localenv is a critical piece, don't run without it


*** conflicting os packages vs local::lib ones
this should seldom be an issue if you follow the above advice. it sometimes 
comes into play with perl xs stuff which needs compiled c code or libs. i would 
recommend to install as little as possible via the os package manager, if 
that's at all possible. my server only runs netdisco & a few random things, 
making me have just under 20 os perl related things. local::lib / localenv 
should not make this matter to much since it makes perl prefer your local 
libraries, but hardcoded links or falling back to perl os modules can still 
happen.

the one exception in the perl snmp module since this requires to be mated to 
the rest of net-snmp. it's possible to install this as netdisco and documented 
in our wiki, but not recommended.


*** perl module install failure
since know we know that even with local::lib / localenv perl can fall back to 
os packages the final issue could be a cpanm install issue.

xs modules can be ... interesting. i would suggest having a look in 
~/.cpanm/work/*/build.log & check for compile or install failures. in most 
cases this is due to missing dev packages for their c libraries.
while i don't run redhat, for opensuse the following dev packages will be 
required: 
https://github.com/netdisco/netdisco/wiki/windows-10,-intellij-idea-ultimate-and-wsl#wsl-setup
 (yes, it's a guide for windows + wsl, but also required on opensuse 15.1)


*** exotic failures or stuff i just didn't think off
won't go into much detail here, but sometimes distros want to "improve" the 
packages they provide. debian's rng fubar and red hat 7's gcc 2.96 come to mind 
(yes, old examples, but got bitten by 'em at $oldjob). our install method 
should avoid most of this. (yes, arguments can be made pro & con this method, 
but that's how we test)
or you could just have had bad luck that something actually got broken in an 
upstream perl module.

other exotic failures could include strange locale settings in your env or db, 
or even more obscure stuff. i think i once saw that "pid_xxxx is not numeric" 
error when trying something in cygwin, following the above steps resolved it 
for me.






other options are using perlbrew (local::lib stll uses the os perl version, 
perlbrew is completely self contained & works great on all systems i tested) or 
netdisco-docker, but those are for another time.


good luck!

// nick



From: Oliver Gorwits [mailto:[email protected]]
Sent: Thursday, January 30, 2020 20:32
To: Ricardo Stella <[email protected]>
Cc: [email protected]
Subject: Re: [Netdisco] scheduled jobs stop

I was looking to see if the issue is related to an upstream library change, 
rather than in Netdisco.

Mainly because I'm scratching my head trying to work out what would cause this, 
and I can't yet reproduce it.

On Wed, 29 Jan 2020 at 16:00, Ricardo Stella 
<[email protected]<mailto:[email protected]>> wrote:

Almost there...

[netdisco@netdisco ~]$ ~/bin/localenv perl -MSereal\ 999 -e 1
Sereal version 999 required--this is only version 4.007.
BEGIN failed--compilation aborted.
[netdisco@netdisco ~]$ ~/bin/localenv perl -MMCE::Queue\ 999 -e 1
MCE::Queue version 999 required--this is only version 1.865.
BEGIN failed--compilation aborted.
[netdisco@netdisco ~]$ ~/bin/localenv cpanm Sereal MCE
Sereal is up to date. (4.007)
MCE is up to date. (1.865)

I assume we are trying to delete them and force download?



On Wed, Jan 29, 2020 at 10:52 AM Oliver Gorwits 
<[email protected]<mailto:[email protected]>> wrote:
Sorry, my apologies, yes you would need to add "~/bin/localenv" to the start of 
all those commands, I believe



On Wed, 29 Jan 2020 at 15:17, Ricardo Stella 
<[email protected]<mailto:[email protected]>> wrote:

Running as the netdisco user, I'm getting:

Can't locate Sereal.pm in @INC (@INC contains: /usr/local/lib64/perl5 
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl 
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .).
BEGIN failed--compilation aborted.

Does it need --local-lib ~/perl5 or ~/bin/localenv first? And --notest?




On Wed, Jan 29, 2020 at 9:47 AM Oliver Gorwits 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ricardo

Please can you also run:
perl -MSereal\ 999 -e 1
perl -MMCE::Queue\ 999 -e 1

Then run
cpanm Sereal MCE

and then let us know if the problem is still there?

thanks,
oliver.

On Wed, 29 Jan 2020 at 14:15, Ricardo Stella 
<[email protected]<mailto:[email protected]>> wrote:

Well, it's definitely a bug with the latest versions.  I upgraded the original 
instance I had which was running fine under 2.040006 since March of last year. 
This one also is exhibiting the same issues with jobs queued since 5:30pm 
yesterday.

Error logs on that instance since last restart yesterday afternoon are:

[7901] 2020-01-28 16:03:03  warn App::Netdisco 2.044011 backend
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <$__ANONIO__> line 1.
Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
input at srl_decoder.c line 580 at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm 
line 1445, <$__ANONIO__> line 1.
Argument "=M-srl^D\0A,{App::Netdisco::Backend::Job(*^Ofstatusfqueu..." isn't 
numeric in int at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1484, 
<$__ANONIO__> line 1753.
Argument "=M-srl^D\0A,{App::Netdisco::Backend::Job(*^Ofstatusfqueu..." isn't 
numeric in int at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1484, 
<$__ANONIO__> line 15984.
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <$__ANONIO__> line 1.
Can't call method "status" without a package or object reference at 
/home/netdisco/perl5/lib/perl5/App/Netdisco/Backend/Role/Poller.pm line 38, 
<$__ANONIO__> line 1.
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <$__ANONIO__> line 1.
Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
input at srl_decoder.c line 580 at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm 
line 1445, <$__ANONIO__> line 1.



On Tue, Jan 28, 2020 at 11:18 AM Ricardo Stella 
<[email protected]<mailto:[email protected]>> wrote:

And just noticed that there's a newer version out there. Updated the new 
instance (including wiping the perl5 directory) and right after I started it, I 
got an error message. The old one was also updated but it's not giving me any 
errors so far.

[8849] 2020-01-28 16:13:41  warn App::Netdisco 2.044011 backend
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <__ANONIO__> line 1.
Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
input at srl_decoder.c line 580 at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm 
line 1445, <__ANONIO__> line 1.
Argument "=M-srl^D\0A,{App::Netdisco::Backend::Job(*^Ofstatusfqueu..." isn't 
numeric in int at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1484, 
<__ANONIO__> line 32.




On Tue, Jan 28, 2020 at 9:56 AM Ricardo Stella 
<[email protected]<mailto:[email protected]>> wrote:

Same here...

backend status thinks it's running but jobs are queued since last night and not 
running. Here are the errors since last restart yesterday:

[24657] 2020-01-27 16:00:58  warn App::Netdisco 2.044009 backend
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <__ANONIO__> line 1.
Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
input at srl_decoder.c line 580 at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm 
line 1445, <__ANONIO__> line 1.
Argument "=M-srl^D\0A,{App::Netdisco::Backend::Job(*^Ofstatusfqueu..." isn't 
numeric in int at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1484, 
<__ANONIO__> line 27785.
Argument "" isn't numeric in read at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439, <__ANONIO__> line 1.
Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
input at srl_decoder.c line 580 at /home/netdisco/perl5/lib/perl5/MCE/Queue.pm 
line 1445, <__ANONIO__> line 1.

We also did a DB dump and restore it on a new instance, so not sure if this is 
related.

I restarted our older instance. I will update Netdisco (running 2.040006) and 
see if there are any issues on this instance. Worse case, I'll redo the dump 
again.


On Tue, Jan 28, 2020 at 4:33 AM 
<[email protected]<mailto:[email protected]>> wrote:
Hi Ricardo
Sorry I forgot to sign my email
By the way I'm Marco

It stopped again yesterday after few hour.
    ...
    [5754] 2020-01-27 17:06:59 debug -> run worker main/wirelessnodes/100
    [5754] 2020-01-27 17:06:59  info pol (3): wrapping up macsuck job(22425208) 
- status done at Mon Jan 27 18:06:59 2020
    [5750] 2020-01-27 17:06:59 debug  [172.17.119.6] macsuck - port 1:43 vlan 
unknown : 1 nodes
    Argument "PID_5754" isn't numeric in abs at 
/home/netdisco/perl5/lib/perl5/MCE/Core/Manager.pm line 206, <__ANONIO__> line 
32948.
    Can't call method "_mce_m_pending" on an undefined value at 
/home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 679, <__ANONIO__> line 32949.

I activated debug, it seems that some scheduled jobs (macsuck, discoverall 
etc.) cause the error "Argument "PID_####" isn't numeric " and it zombies 
netdisco-backend child
    ps aux | grep netd
    netdisco  3428  0.0  0.3  22840 15848 ?        S    gen27   2:05 
netdisco-backend
    netdisco  3429  0.0  0.0      0     0 ?        Z    gen27   0:15 [nd2: 
master] <defunct>

I can't say if it is caused by my new setup/configuration or something else

Marco

> Il 27 gennaio 2020 alle 17.03 Ricardo Stella 
> <[email protected]<mailto:[email protected]>> ha scritto:
>
>
> Also happening here. I also had exported the DB in order to install on a new 
> VM with new OS. Had a couple of problems that I posted but had this same 
> error on the logs.
> Noticed all jobs queued for a couple of days and nothing running.
> Last message on logs was:
> Argument "" isn't numeric in read at 
> /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1439,  line 1.
> Sereal: Error: Bad Sereal header: Not a valid Sereal document. at offset 1 of 
> input at srl_decoder.c line 580 at 
> /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 1445,  line 1.
>
> Restarting it seems to get the jobs running again.
>
>
> On Mon, Jan 27, 2020 at 10:54 AM marco via netdisco-users < 
> [email protected]<mailto:[email protected]>>
>  wrote:
> > Hi there
> >
> > I had set up a new ND2 host on debian buster some weeks ago
> > for experimental purpose
> > I have another ND2 host up and running since years
> >
> > Software        Version
> > App::Netdisco   2.44.4
> > SNMP::Info      3.70
> > DB Schema       61
> > PostgreSQL      12.00.1
> > Perl    5.28.1
> >
> > I restore db from another ND2
> > and copy deployment.yml
> > It worked
> >
> > But I noticed that it stops running the scheduled jobs after some times 
> > (days)
> > I had to restart netdisco-backend,
> >
> >
> > here some info I collect
> >
> >     from netdisco-backend.log
> >     ...
> >     [392] 2020-01-24 15:15:18 debug mgr (2): getting potential jobs for 1 
> > workers
> >     [2700] 2020-01-24 15:15:18 debug  [172.17.185.50] arpnip - processed 
> > 373 ARP Cache entries
> >     [2700] 2020-01-24 15:15:18 debug  [172.17.185.50] arpnip - processed 0 
> > IPv6 Neighbor Cache entries
> >     [2700] 2020-01-24 15:15:18  info pol (3): wrapping up arpnip 
> > job(22423168) - status done at Fri Jan 24 16:15:18 2020
> >     [392] 2020-01-24 15:15:18 debug getsome: cancelled 0E0 duplicate(s) of 
> > job 22423235
> >     [392] 2020-01-24 15:15:18  info mgr (2): job 22423235 booked out for 
> > this processing node
> >     Argument "PID_2700" isn't numeric in read at 
> > /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 477,  line 31470.
> >     Sereal: Error: Bad Sereal header: Not a valid Sereal document. at 
> > offset 1 of input at srl_decoder.c line 580 at 
> > /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 480,  line 31470.
> >
> >     root@deb-netdisco:~# systemctl status netdisco-backend.service
> >     ● netdisco-backend.service - Netdisco Backend Service
> >     Loaded: loaded (/etc/systemd/system/netdisco-backend.service; enabled; 
> > vendor preset: enabled)
> >     Active: active (running) since Fri 2020-01-24 09:53:03 CET; 3 days ago
> >     Process: 110 ExecStart=/home/netdisco/bin/netdisco-backend start 
> > (code=exited, status=0/SUCCESS)
> >     Main PID: 216 (netdisco-backen)
> >         Tasks: 2 (limit: 4915)
> >     Memory: 143.0M
> >     CGroup: /system.slice/netdisco-backend.service
> >             └─216 netdisco-backend
> >
> >     gen 24 09:53:02 deb-netdisco systemd[1]: Starting Netdisco Backend 
> > Service...
> >     gen 24 09:53:03 deb-netdisco netdisco-backend[110]: Netdisco Backend    
> >                                           [Started]
> >     gen 24 09:53:03 deb-netdisco netdisco-backend[110]: config watcher: 
> > watching /home/netdisco/environments for updates.
> >     gen 24 09:53:03 deb-netdisco systemd[1]: Started Netdisco Backend 
> > Service.
> >     gen 24 10:01:48 deb-netdisco netdisco-backend[110]: -- 
> > /home/netdisco/environments/deployment.yml updated.
> >     gen 24 10:01:48 deb-netdisco netdisco-backend[110]: config watcher: 
> > sending TERM to the server (pid:217)...
> >
> >     root@deb-netdisco:~# ps aux | grep netd
> >     netdisco   216  0.0  0.3  22840 16008 ?        S    gen24   6:19 
> > netdisco-backend
> >     netdisco   281  0.0  0.3  20744 13680 ?        S    gen24   0:00 perl 
> > /home/netdisco/bin/netdisco-web start
> >     netdisco   282  0.0  0.3  22152 16696 ?        S    gen24   0:47 
> > starman master --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     netdisco   372  0.0  0.0      0     0 ?        Z    gen24   0:16 [nd2: 
> > master]
> >     netdisco   373  0.0  2.7 135148 117200 ?       S    gen24   0:06 
> > starman worker --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     netdisco   374  0.0  2.8 136000 118000 ?       S    gen24   0:06 
> > starman worker --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     netdisco   375  0.0  2.7 133744 115940 ?       S    gen24   0:06 
> > starman worker --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     netdisco   376  0.0  2.8 137420 119504 ?       S    gen24   0:06 
> > starman worker --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     netdisco   377  0.0  2.7 133792 115996 ?       S    gen24   0:05 
> > starman worker --disable-keepalive --user 1001 --group 1001 
> > /home/netdisco/perl5/bin/netdisco-web-fg
> >     root      3405  0.0  0.0   6096   824 pts/0    S+   10:59   0:00 grep 
> > netd
> >
> >
> > after stop and start
> >     root@deb-netdisco:~# systemctl start netdisco-backend.service
> >
> > it seems to work again
> >     [392] 2020-01-24 15:15:18  info mgr (2): job 22423235 booked out for 
> > this processing node
> >     Argument "PID_2700" isn't numeric in read at 
> > /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 477,  line 31470.
> >     Sereal: Error: Bad Sereal header: Not a valid Sereal document. at 
> > offset 1 of input at srl_decoder.c line 580 at 
> > /home/netdisco/perl5/lib/perl5/MCE/Queue.pm line 480,  line 31470.
> >     [3429] 2020-01-27 10:10:08  warn App::Netdisco 2.044004 backend
> >     [3429] 2020-01-27 10:10:08  info resolving backend hostname...
> >     [3433] 2020-01-27 10:10:08  info applying role Scheduler to worker 1
> >     [3436] 2020-01-27 10:10:08  info applying role Poller to worker 4
> >     ...
> >
> >
> > _______________________________________________
> > Netdisco mailing list
> > [email protected]<mailto:[email protected]>
> > https://sourceforge.net/p/netdisco/mailman/netdisco-users/
>
> --
> °((( = (( ===°°° ((( ================================================


--
°(((=((===°°°(((================================================


--
°(((=((===°°°(((================================================


--
°(((=((===°°°(((================================================
_______________________________________________
Netdisco mailing list
[email protected]<mailto:[email protected]>
https://sourceforge.net/p/netdisco/mailman/netdisco-users/


--
°(((=((===°°°(((================================================


--
°(((=((===°°°(((================================================

________________________________

Volg Aquafin op Facebook<https://www.facebook.com/AquafinNV> | 
Twitter<https://twitter.com/aquafinnv> | 
YouTube<http://www.youtube.com/channel/UCk_4P5BJ-MtEEDCkCsR_KqQ?feature=mhee> | 
LinkedIN<http://www.linkedin.com/company/aquafin/products> | 
Instagram<https://www.instagram.com/aquafin_nv/>

In het kader van de uitoefening van onze taken verzamelen we bij Aquafin 
persoonsgegevens. Hoe we omgaan met deze gegevens en wat de rechten van de 
betrokkenen zijn, kan je nalezen in onze privacy 
policy<https://www.aquafin.be/nl-be/privacy-policy>.

  P Denk aan het milieu. Druk deze mail niet onnodig af.

--- End Message ---
_______________________________________________
Netdisco mailing list - Digest Mode
[email protected]
https://lists.sourceforge.net/lists/listinfo/netdisco-users

Reply via email to