Hi Martin,
Thanks for the system call traces!
I analyzed them in the meantime and have some questions (see end of the
mail).
Regards,
Gerlof
On 11/09/2016 09:37 AM, Martin Steigerwald wrote:
Hi Gerlof.
Very busy week, so feel free to ask for further information, but please
understand that I may only report back next week.
Am Samstag, 29. Oktober 2016, 10:26:38 CET schrieben Sie:
Hi Marc, hi Gerlof (on CC).
Am Mittwoch, 26. Oktober 2016, 10:44:30 CEST schrieb Marc Haber:
On Wed, Oct 26, 2016 at 10:31:54AM +0200, Martin Steigerwald wrote:
merkaba:~> systemctl status atopacct
● atopacct.service - Atop process accounting daemon
Loaded: loaded (/lib/systemd/system/atopacct.service; enabled;
vendor
preset: enabled) Active: active (running) since Mi 2016-10-26
10:21:29
CEST; 5min ago>
Docs: man:atopacctd(8)
Process: 31802 ExecStart=/usr/sbin/atopacctd (code=exited,
status=0/SUCCESS)>
Main PID: 31804 (atopacctd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/atopacct.service
└─31804 /usr/sbin/atopacctd
Okt 26 10:21:29 merkaba atopacctd[31804]: Version: 2.2-4 - 2016/10/14
23:17:04 <[email protected]> Okt 26 10:21:29 merkaba
systemd[1]: Starting Atop process accounting daemon... Okt 26 10:21:29
merkaba atopacctd[31804]: accounting to /run/pacct_source Okt 26
10:21:29
merkaba systemd[1]: Started Atop process accounting daemon.
Yet no procacct warning in atop.
I'm running out of ideas. Can you strace atop and see where it fails?
After your answering your question below I think it is still atopacctd
having>
issues:
But accounting seems to work:
merkaba:~> find /run/pacct_source -ls
1297627 44 -rw------- 1 root root 44928 Okt 26 10:28
/run/pacct_source
How about the files in /run/pacct_shadow.d/? These are the ones an
atop process tries to read.
I have:
-rw-r--r-- 1 root root 21184 Oct 26 00:46 0000000000.paf
-rw-r--r-- 1 root root 7 Oct 26 00:00 current
and the paf file is growing.
There is the issue:
merkaba:~> ls -l /run/pacct_source
-rw------- 1 root root 1174656 Okt 27 22:46 /run/pacct_source
merkaba:~> ls -l /run/pacct_shadow.d
insgesamt 4
-rw-r--r-- 1 root root 0 Okt 27 06:53 0000000000.paf
-rw-r--r-- 1 root root 7 Okt 27 06:53 current
The pacct_source file grows, but the paf file does not.
merkaba:~> ps aux | grep "[a]top"
root 1573 0.0 0.0 4184 84 ? S< 06:53 0:00
/usr/sbin/
atopacctd
root 1904 0.0 0.0 27500 10492 ? S<Ls 06:53 0:26
/usr/bin/atop -a -R -w /var/log/atop/atop_20161027 600
merkaba:~> LANG=C systemctl status atopacct
* atopacct.service - Atop process accounting daemon
Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
preset: enabled)
Active: active (running) since Thu 2016-10-27 06:53:31 CEST; 15h ago
Docs: man:atopacctd(8)
Main PID: 1573 (atopacctd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/atopacct.service
`-1573 /usr/sbin/atopacctd
Oct 27 06:53:31 merkaba systemd[1]: Starting Atop process accounting
daemon... Oct 27 06:53:31 merkaba atopacctd[1573]: Version: 2.2-4 -
2016/10/14 23:17:04 <gerlof[…]>
Oct 27 06:53:31 merkaba atopacctd[1573]: accounting to /run/pacct_source
Oct 27 06:53:31 merkaba systemd[1]: Started Atop process accounting
daemon.
Hmmm, but atopacctd actually accounts to /run/pacct_source? Is that a
configuration mismatch?
Okay, from strace atop opens the paf file and so… reads from a zero byte
file.
25046 open("/run/pacct_shadow.d/current", O_RDONLY) = 3
25046 open("/run/pacct_shadow.d/0000000000.paf", O_RDONLY) = 3
I attach the complete output of strace -e file -f -o /tmp/atop.strace
atop.
from manpage of atopacctd:
The directory /run is used as the default topdirectory. Below
this top-directory, the source file pacct_source is created to
which the kernel writes the process accounting records.
Furthermore, the subdirectory pacct_shadow.d is created as a
'container' for the shadow files. Apart from the shadow files,
also the file current is maintained in this subdirectory, con‐
taining the sequence number of the current (newest) shadow file
and the maximum number of records that will be written in each
shadow file.
I don´t fully understand the mechanism atopacctd uses here from above
description, but it for some reason file pacct_source definately grows,
but
00000….paf remains empty.
[…]
It would be of great help if you could do the following:
1) Stop atopacctd and stop all atop incarnations (clean situation).
2) Start atopacctd again (via systemctl).
3) Start 'strace -tt -f -o atopacct.strace -p <pid_of_atopacctd>
4) Manually start command 'strace -tt -f -o atop.strace atop 1 1 >
/tmp/atop.out'
You could verify if the file /tmp/atop.out really contains the
indication 'no procacct'
to be sure that the problem is caught.
5) Stop the 'strace' command on atopacctd.
Please send me the files atopacct.strace, atop.strace and /tmp/atop.out.
1) merkaba:~> ps aux | grep "[a]top"
merkaba:~#1>
2 und 3) merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/
atop.strace -p $( pidof /usr/sbin/atopacctd )
Job for atopacct.service failed because of unavailable resources or another
system error.
See "systemctl status atopacct.service" and "journalctl -xe" for details.
strace: option requires an argument -- 'p'
Try 'strace -h' for more information.
merkaba:~#1> systemctl status atopacct
● atopacct.service - Atop process accounting daemon
Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
preset: ena
Active: failed (Result: resources) since Wed 2016-11-09 09:25:10 CET; 14s
ago
Docs: man:atopacctd(8)
Process: 13973 ExecStart=/usr/sbin/atopacctd (code=exited, status=0/SUCCESS)
Main PID: 1050 (code=killed, signal=KILL)
Nov 09 09:25:10 merkaba systemd[1]: Starting Atop process accounting daemon...
Nov 09 09:25:10 merkaba atopacctd[13973]: /run/pacct_shadow.d: File exists
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: PID file /run/
atopacctd.pid not readable (yet?) after start: No such file or directory
Nov 09 09:25:10 merkaba systemd[1]: Failed to start Atop process accounting
daemon.
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Unit entered failed
state.
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Failed with result
'resources'
(I report this as one another bug when I get the chance, Mark, this again is
the /run/pacct_shadow.d: File exists issue I stumbled across before)
merkaba:~#3> rm -r /run/pacct_shadow.d
merkaba:~> systemctl stop atopacct
merkaba:~> ps aux | grep "[a]top"
merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/atop.strace -p $
( pidof /usr/sbin/atopacctd )
strace: Process 14074 attached
4) merkaba:~> strace -tt -f -o /tmp/atop.strace atop 1 1 >/tmp/atop.out
Waited till some seconds after completion
5) merkaba:~> systemctl start atopacct ; strace -tt -f -o /tmp/atopacct.strace
-p $( pidof /usr/sbin/atopacctd )
strace: Process 14233 attached
^Cstrace: Process 14233 detached
Files attached. atopacct.strace is quite small. The strace misses the startup
of atopacctd even tough I directly start if after the systemctl start.
Thank you,
Observations ATOP system call trace:
------------------------------------
When atop starts, the following steps are taken:
1) Check if process accounting is controlled by the atopacct
daemon (verify the presence of a semaphore for this).
2) Read the first process accounting record in the current
shadow file to determine the type of accounting record.
3) If the current shadow file is empty (in case that the first
atop incarnation is started), for a child process which
immediately terminates to force an accounting record to
be written to the current shadow file.
Afterwards repeat step 2).
4) If the current shadow file is still empty after
forking/terminating a child process, apparently process
accounting does not work properly.
This is the reason why atop indicates "no procacct"!!
Observations ATOPACCTD system call trace:
------------------------------------
When atopacctd runs, the following steps are taken:
1) Wait for a message from the netlink socket (neutral
state).
2) When a process terminates, atopacctd receives
a message from the netlink socket releasing atopacctd
from its wait state.
3) Read the new process accounting record(s) from the pacct_source file
and write them
to the shadow file if at least one atop process is active (which is
the case here).
4) Continue with step 1.
The system call trace shows that atopacctd does NOT receive
a message from the netlink socket at the moment that atop
forks/terminates a child process. It remains pending in step 1.
Therefore atop thinks that process accounting is not activated.
So the question is why atopacctd is not informed by the kernel
via the netlink socket about the termination of a process?
And why does it only happen on this test system (I never encounter
these problems on my own system, neither do a lot of other atop users)?
1) Is the netlink interface configured in this kernel?
It is one of the options you can select while configuring the kernel.
2) Can you anyhow see that the pacct_source ever grows on this system?
If so, the process accounting mechanism and the netlink socket
anyhow works.
3) Is the psacct package installed on this system?
Sometimes this package disturbs the process accounting mechanism,
because
it switches on process accounting by itself.