Hello Gerlof. Am Samstag, 12. November 2016, 14:18:02 CET schrieb Gerlof Langeveld: > Hi Martin, > > Thanks for the system call traces! > I analyzed them in the meantime and have some questions (see end of the > mail).
Answers below: > On 11/09/2016 09:37 AM, Martin Steigerwald wrote: > > Hi Gerlof. > > > > Very busy week, so feel free to ask for further information, but please > > understand that I may only report back next week. > > > > Am Samstag, 29. Oktober 2016, 10:26:38 CET schrieben Sie: > >>> Hi Marc, hi Gerlof (on CC). > >>> > >>> Am Mittwoch, 26. Oktober 2016, 10:44:30 CEST schrieb Marc Haber: > >>>> On Wed, Oct 26, 2016 at 10:31:54AM +0200, Martin Steigerwald wrote: > >>>>> merkaba:~> systemctl status atopacct > >>>>> ● atopacct.service - Atop process accounting daemon > >>>>> > >>>>> Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; > >>>>> vendor > >>>>> preset: enabled) Active: active (running) since Mi 2016-10-26 > >>>>> 10:21:29 > >>>>> CEST; 5min ago> > >>>>> > >>>>> Docs: man:atopacctd(8) > >>>>> > >>>>> Process: 31802 ExecStart=/usr/sbin/atopacctd (code=exited, > >>>>> status=0/SUCCESS)> > >>>>> > >>>>> Main PID: 31804 (atopacctd) > >>>>> > >>>>> Tasks: 1 (limit: 4915) > >>>>> > >>>>> CGroup: /system.slice/atopacct.service > >>>>> > >>>>> └─31804 /usr/sbin/atopacctd > >>>>> > >>>>> Okt 26 10:21:29 merkaba atopacctd[31804]: Version: 2.2-4 - 2016/10/14 > >>>>> 23:17:04 <[email protected]> Okt 26 10:21:29 merkaba > >>>>> systemd[1]: Starting Atop process accounting daemon... Okt 26 10:21:29 > >>>>> merkaba atopacctd[31804]: accounting to /run/pacct_source Okt 26 > >>>>> 10:21:29 > >>>>> merkaba systemd[1]: Started Atop process accounting daemon. > >>>>> > >>>>> > >>>>> Yet no procacct warning in atop. > >>>> > >>>> I'm running out of ideas. Can you strace atop and see where it fails? > >>> > >>> After your answering your question below I think it is still atopacctd > >>> having> > >>> > >>> issues: > >>>>> But accounting seems to work: > >>>>> > >>>>> merkaba:~> find /run/pacct_source -ls > >>>>> > >>>>> 1297627 44 -rw------- 1 root root 44928 Okt 26 > >>>>> 10:28 > >>>>> /run/pacct_source > >>>> > >>>> How about the files in /run/pacct_shadow.d/? These are the ones an > >>>> atop process tries to read. > >>>> > >>>> I have: > >>>> -rw-r--r-- 1 root root 21184 Oct 26 00:46 0000000000.paf > >>>> -rw-r--r-- 1 root root 7 Oct 26 00:00 current > >>>> > >>>> and the paf file is growing. > >>> > >>> There is the issue: > >>> > >>> merkaba:~> ls -l /run/pacct_source > >>> -rw------- 1 root root 1174656 Okt 27 22:46 /run/pacct_source > >>> merkaba:~> ls -l /run/pacct_shadow.d > >>> insgesamt 4 > >>> -rw-r--r-- 1 root root 0 Okt 27 06:53 0000000000.paf > >>> -rw-r--r-- 1 root root 7 Okt 27 06:53 current > >>> > >>> The pacct_source file grows, but the paf file does not. > >>> > >>> merkaba:~> ps aux | grep "[a]top" > >>> root 1573 0.0 0.0 4184 84 ? S< 06:53 0:00 > >>> /usr/sbin/ > >>> atopacctd > >>> root 1904 0.0 0.0 27500 10492 ? S<Ls 06:53 0:26 > >>> /usr/bin/atop -a -R -w /var/log/atop/atop_20161027 600 > >>> > >>> merkaba:~> LANG=C systemctl status atopacct > >>> * atopacct.service - Atop process accounting daemon > >>> > >>> Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; > >>> vendor > >>> > >>> preset: enabled) > >>> > >>> Active: active (running) since Thu 2016-10-27 06:53:31 CEST; 15h > >>> ago > >>> > >>> Docs: man:atopacctd(8) > >>> > >>> Main PID: 1573 (atopacctd) > >>> > >>> Tasks: 1 (limit: 4915) > >>> > >>> CGroup: /system.slice/atopacct.service > >>> > >>> `-1573 /usr/sbin/atopacctd > >>> > >>> Oct 27 06:53:31 merkaba systemd[1]: Starting Atop process accounting > >>> daemon... Oct 27 06:53:31 merkaba atopacctd[1573]: Version: 2.2-4 - > >>> 2016/10/14 23:17:04 <gerlof[…]> > >>> Oct 27 06:53:31 merkaba atopacctd[1573]: accounting to /run/pacct_source > >>> Oct 27 06:53:31 merkaba systemd[1]: Started Atop process accounting > >>> daemon. > >>> > >>> > >>> Hmmm, but atopacctd actually accounts to /run/pacct_source? Is that a > >>> configuration mismatch? > >>> > >>> Okay, from strace atop opens the paf file and so… reads from a zero byte > >>> file. > >>> > >>> 25046 open("/run/pacct_shadow.d/current", O_RDONLY) = 3 > >>> 25046 open("/run/pacct_shadow.d/0000000000.paf", O_RDONLY) = 3 > >>> > >>> I attach the complete output of strace -e file -f -o /tmp/atop.strace > >>> atop. > >>> > >>> from manpage of atopacctd: > >>> The directory /run is used as the default topdirectory. Below > >>> this top-directory, the source file pacct_source is created to > >>> which the kernel writes the process accounting records. > >>> Furthermore, the subdirectory pacct_shadow.d is created as a > >>> 'container' for the shadow files. Apart from the shadow files, > >>> also the file current is maintained in this subdirectory, con‐ > >>> taining the sequence number of the current (newest) shadow file > >>> and the maximum number of records that will be written in each > >>> shadow file. > >>> > >>> I don´t fully understand the mechanism atopacctd uses here from above > >>> description, but it for some reason file pacct_source definately grows, > >>> but > >>> 00000….paf remains empty. > > > > […] > > > >> It would be of great help if you could do the following: > >> > >> 1) Stop atopacctd and stop all atop incarnations (clean situation). > >> > >> 2) Start atopacctd again (via systemctl). > >> > >> 3) Start 'strace -tt -f -o atopacct.strace -p <pid_of_atopacctd> > >> > >> 4) Manually start command 'strace -tt -f -o atop.strace atop 1 1 > > >> /tmp/atop.out' > >> > >> You could verify if the file /tmp/atop.out really contains the > >> > >> indication 'no procacct' > >> > >> to be sure that the problem is caught. > >> > >> 5) Stop the 'strace' command on atopacctd. > >> > >> Please send me the files atopacct.strace, atop.strace and /tmp/atop.out. > > > > 1) merkaba:~> ps aux | grep "[a]top" > > merkaba:~#1> > > > > 2 und 3) merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/ > > atop.strace -p $( pidof /usr/sbin/atopacctd ) > > Job for atopacct.service failed because of unavailable resources or > > another > > system error. > > See "systemctl status atopacct.service" and "journalctl -xe" for details. > > strace: option requires an argument -- 'p' > > Try 'strace -h' for more information. > > merkaba:~#1> systemctl status atopacct > > ● atopacct.service - Atop process accounting daemon > > > > Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor > > > > preset: ena > > > > Active: failed (Result: resources) since Wed 2016-11-09 09:25:10 CET; > > 14s > > > > ago > > > > Docs: man:atopacctd(8) > > > > Process: 13973 ExecStart=/usr/sbin/atopacctd (code=exited, > > status=0/SUCCESS)> > > Main PID: 1050 (code=killed, signal=KILL) > > > > Nov 09 09:25:10 merkaba systemd[1]: Starting Atop process accounting > > daemon... Nov 09 09:25:10 merkaba atopacctd[13973]: /run/pacct_shadow.d: > > File exists Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: PID > > file /run/ atopacctd.pid not readable (yet?) after start: No such file or > > directory Nov 09 09:25:10 merkaba systemd[1]: Failed to start Atop > > process accounting daemon. > > Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Unit entered failed > > state. > > Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Failed with result > > 'resources' > > > > (I report this as one another bug when I get the chance, Mark, this again > > is the /run/pacct_shadow.d: File exists issue I stumbled across before) > > > > merkaba:~#3> rm -r /run/pacct_shadow.d > > merkaba:~> systemctl stop atopacct > > merkaba:~> ps aux | grep "[a]top" > > merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/atop.strace > > -p $ ( pidof /usr/sbin/atopacctd ) > > strace: Process 14074 attached > > > > 4) merkaba:~> strace -tt -f -o /tmp/atop.strace atop 1 1 >/tmp/atop.out > > > > Waited till some seconds after completion > > > > 5) merkaba:~> systemctl start atopacct ; strace -tt -f -o > > /tmp/atopacct.strace -p $( pidof /usr/sbin/atopacctd ) > > strace: Process 14233 attached > > ^Cstrace: Process 14233 detached > > > > > > Files attached. atopacct.strace is quite small. The strace misses the > > startup of atopacctd even tough I directly start if after the systemctl > > start. > > > > Thank you, > > Observations ATOP system call trace: > ------------------------------------ > > When atop starts, the following steps are taken: > > 1) Check if process accounting is controlled by the atopacct > daemon (verify the presence of a semaphore for this). > > 2) Read the first process accounting record in the current > shadow file to determine the type of accounting record. > > 3) If the current shadow file is empty (in case that the first > atop incarnation is started), for a child process which > immediately terminates to force an accounting record to > be written to the current shadow file. > Afterwards repeat step 2). > > 4) If the current shadow file is still empty after > forking/terminating a child process, apparently process > accounting does not work properly. > > This is the reason why atop indicates "no procacct"!! > > > Observations ATOPACCTD system call trace: > ------------------------------------ > > When atopacctd runs, the following steps are taken: > > 1) Wait for a message from the netlink socket (neutral > state). > > 2) When a process terminates, atopacctd receives > a message from the netlink socket releasing atopacctd > from its wait state. > > 3) Read the new process accounting record(s) from the pacct_source file > and write them > to the shadow file if at least one atop process is active (which is > the case here). > > 4) Continue with step 1. > > The system call trace shows that atopacctd does NOT receive > a message from the netlink socket at the moment that atop > forks/terminates a child process. It remains pending in step 1. > Therefore atop thinks that process accounting is not activated. > > So the question is why atopacctd is not informed by the kernel > via the netlink socket about the termination of a process? > And why does it only happen on this test system (I never encounter > these problems on my own system, neither do a lot of other atop users)? > > 1) Is the netlink interface configured in this kernel? > It is one of the options you can select while configuring the kernel. I tested this with standard Debian kernel 4.8.0-1-amd64 (x86_64) as self- compiled 4.9-rc4 has severe regressions in Intel DRM driver. I am not sure which option to look for. Is it CONFIG_NETLINK_DIAG? It is enabled in the Debian kernel and my self-compiled kernels. > 2) Can you anyhow see that the pacct_source ever grows on this system? > If so, the process accounting mechanism and the netlink socket > anyhow works. Yes, it grows: merkaba:/run> for (( I=1 ; I<=5 ; I=I+1 )); do ls -l pacct_source ; sleep 1; done -rw------- 1 root root 299008 Nov 14 18:33 pacct_source -rw------- 1 root root 299136 Nov 14 18:33 pacct_source -rw------- 1 root root 299264 Nov 14 18:33 pacct_source -rw------- 1 root root 299392 Nov 14 18:34 pacct_source -rw------- 1 root root 299520 Nov 14 18:34 pacct_source > 3) Is the psacct package installed on this system? > Sometimes this package disturbs the process accounting mechanism, > because > it switches on process accounting by itself. It seem that I do not have this at all in Debian: merkaba:~> locate psacct merkaba:~#1> merkaba:~> dpkg -l | grep psacct merkaba:~#1> merkaba:~#1> apt-cache search psacct merkaba:~> apt-cache search process accounting | sort chiark-utils-bin - chiark system administration utilities cups-tea4cups - Swiss Army's knife of advanced CUPS administrators flow-tools - collects and processes NetFlow data gnucash-docs - Documentation for gnucash, a personal finance tracking program munge - authentication service to create and validate credentials python-gnucash - Gnucash interface for Python sql-ledger - Web based double-entry accounting program (Interestingly it doesn´t find the atopacctd package either, but thats more an issue of the description.) Thanks, -- Martin

