Hi Martin,

Thanks for the system call traces!
I analyzed them in the meantime and have some questions (see end of the mail).

Regards,
Gerlof

On 11/09/2016 09:37 AM, Martin Steigerwald wrote:
Hi Gerlof.

Very busy week, so feel free to ask for further information, but please
understand that I may only report back next week.

Am Samstag, 29. Oktober 2016, 10:26:38 CET schrieben Sie:
Hi Marc, hi Gerlof (on CC).

Am Mittwoch, 26. Oktober 2016, 10:44:30 CEST schrieb Marc Haber:
On Wed, Oct 26, 2016 at 10:31:54AM +0200, Martin Steigerwald wrote:
merkaba:~> systemctl status atopacct
● atopacct.service - Atop process accounting daemon

     Loaded: loaded (/lib/systemd/system/atopacct.service; enabled;
     vendor
     preset: enabled) Active: active (running) since Mi 2016-10-26
     10:21:29
     CEST; 5min ago>
Docs: man:atopacctd(8) Process: 31802 ExecStart=/usr/sbin/atopacctd (code=exited,
    status=0/SUCCESS)>
Main PID: 31804 (atopacctd) Tasks: 1 (limit: 4915) CGroup: /system.slice/atopacct.service └─31804 /usr/sbin/atopacctd

Okt 26 10:21:29 merkaba atopacctd[31804]: Version: 2.2-4 - 2016/10/14
23:17:04  <[email protected]> Okt 26 10:21:29 merkaba
systemd[1]: Starting Atop process accounting daemon... Okt 26 10:21:29
merkaba atopacctd[31804]: accounting to /run/pacct_source Okt 26
10:21:29
merkaba systemd[1]: Started Atop process accounting daemon.


Yet no procacct warning in atop.
I'm running out of ideas. Can you strace atop and see where it fails?
After your answering your question below I think it is still atopacctd
having>
issues:
But accounting seems to work:

merkaba:~> find /run/pacct_source -ls

    1297627     44 -rw-------   1 root     root        44928 Okt 26 10:28
    /run/pacct_source
How about the files in /run/pacct_shadow.d/? These are the ones an
atop process tries to read.

I have:
-rw-r--r--  1 root root 21184 Oct 26 00:46 0000000000.paf
-rw-r--r--  1 root root     7 Oct 26 00:00 current

and the paf file is growing.
There is the issue:

merkaba:~> ls -l /run/pacct_source
-rw------- 1 root root 1174656 Okt 27 22:46 /run/pacct_source
merkaba:~> ls -l /run/pacct_shadow.d
insgesamt 4
-rw-r--r-- 1 root root 0 Okt 27 06:53 0000000000.paf
-rw-r--r-- 1 root root 7 Okt 27 06:53 current

The pacct_source file grows, but the paf file does not.

merkaba:~> ps aux | grep "[a]top"
root      1573  0.0  0.0   4184    84 ?        S<   06:53   0:00
/usr/sbin/
atopacctd
root      1904  0.0  0.0  27500 10492 ?        S<Ls 06:53   0:26
/usr/bin/atop -a -R -w /var/log/atop/atop_20161027 600

merkaba:~> LANG=C systemctl status atopacct
* atopacct.service - Atop process accounting daemon

     Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor

preset: enabled)

     Active: active (running) since Thu 2016-10-27 06:53:31 CEST; 15h ago
Docs: man:atopacctd(8) Main PID: 1573 (atopacctd) Tasks: 1 (limit: 4915) CGroup: /system.slice/atopacct.service `-1573 /usr/sbin/atopacctd

Oct 27 06:53:31 merkaba systemd[1]: Starting Atop process accounting
daemon... Oct 27 06:53:31 merkaba atopacctd[1573]: Version: 2.2-4 -
2016/10/14 23:17:04 <gerlof[…]>
Oct 27 06:53:31 merkaba atopacctd[1573]: accounting to /run/pacct_source
Oct 27 06:53:31 merkaba systemd[1]: Started Atop process accounting
daemon.


Hmmm, but atopacctd actually accounts to /run/pacct_source? Is that a
configuration mismatch?

Okay, from strace atop opens the paf file and so… reads from a zero byte
file.

25046 open("/run/pacct_shadow.d/current", O_RDONLY) = 3
25046 open("/run/pacct_shadow.d/0000000000.paf", O_RDONLY) = 3

I attach the complete output of strace -e file -f -o /tmp/atop.strace
atop.

from manpage of atopacctd:
         The directory /run is used as the default topdirectory.   Below
         this  top-directory, the source file pacct_source is created to
         which the kernel writes the process accounting records.
         Furthermore, the subdirectory pacct_shadow.d is  created  as  a
         'container' for the shadow files.  Apart from the shadow files,
         also the file current is maintained in this subdirectory,  con‐
         taining the sequence number of the current (newest) shadow file
         and the maximum number of records that will be written in  each
         shadow file.

I don´t fully understand the mechanism atopacctd uses here from above
description, but it for some reason file pacct_source definately grows,
but
00000….paf remains empty.
[…]
It would be of great help if you could do the following:

1) Stop atopacctd and stop all atop incarnations (clean situation).

2) Start atopacctd again (via systemctl).

3) Start 'strace -tt -f -o atopacct.strace -p <pid_of_atopacctd>

4) Manually start command 'strace -tt -f -o atop.strace  atop 1 1 >
/tmp/atop.out'
      You could verify if the file /tmp/atop.out really contains the
indication 'no procacct'
      to be sure that the problem is caught.

5) Stop the 'strace' command on atopacctd.

Please send me the files atopacct.strace, atop.strace and /tmp/atop.out.
1) merkaba:~> ps aux |  grep "[a]top"
merkaba:~#1>

2 und 3) merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/
atop.strace -p $( pidof /usr/sbin/atopacctd )
Job for atopacct.service failed because of unavailable resources or another
system error.
See "systemctl status atopacct.service" and "journalctl -xe" for details.
strace: option requires an argument -- 'p'
Try 'strace -h' for more information.
merkaba:~#1> systemctl status atopacct
● atopacct.service - Atop process accounting daemon
    Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
preset: ena
    Active: failed (Result: resources) since Wed 2016-11-09 09:25:10 CET; 14s
ago
      Docs: man:atopacctd(8)
   Process: 13973 ExecStart=/usr/sbin/atopacctd (code=exited, status=0/SUCCESS)
  Main PID: 1050 (code=killed, signal=KILL)

Nov 09 09:25:10 merkaba systemd[1]: Starting Atop process accounting daemon...
Nov 09 09:25:10 merkaba atopacctd[13973]: /run/pacct_shadow.d: File exists
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: PID file /run/
atopacctd.pid not readable (yet?) after start: No such file or directory
Nov 09 09:25:10 merkaba systemd[1]: Failed to start Atop process accounting
daemon.
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Unit entered failed
state.
Nov 09 09:25:10 merkaba systemd[1]: atopacct.service: Failed with result
'resources'

(I report this as one another bug when I get the chance, Mark, this again is
the /run/pacct_shadow.d: File exists issue I stumbled across before)

merkaba:~#3> rm -r /run/pacct_shadow.d
merkaba:~> systemctl stop atopacct
merkaba:~> ps aux |  grep "[a]top"
merkaba:~#1> systemctl start atopacct ; strace -tt -f -o /tmp/atop.strace -p $
( pidof /usr/sbin/atopacctd )
strace: Process 14074 attached

4) merkaba:~> strace -tt -f -o /tmp/atop.strace  atop 1 1 >/tmp/atop.out

Waited till some seconds after completion

5) merkaba:~> systemctl start atopacct ; strace -tt -f -o /tmp/atopacct.strace
-p $( pidof /usr/sbin/atopacctd )
strace: Process 14233 attached
^Cstrace: Process 14233 detached


Files attached. atopacct.strace is quite small. The strace misses the startup
of atopacctd even tough I directly start if after the systemctl start.

Thank you,

Observations ATOP system call trace:
------------------------------------

When atop starts, the following steps are taken:

1) Check if process accounting is controlled by the atopacct
    daemon (verify the presence of a semaphore for this).

2) Read the first process accounting record in the current
    shadow file to determine the type of accounting record.

3) If the current shadow file is empty (in case that the first
    atop incarnation is started), for a child process which
    immediately terminates to force an accounting record to
    be written to the current shadow file.
    Afterwards repeat step 2).

4) If the current shadow file is still empty after
    forking/terminating a child process, apparently process
    accounting does not work properly.

    This is the reason why atop indicates "no procacct"!!


Observations ATOPACCTD system call trace:
------------------------------------

When atopacctd runs, the following steps are taken:

1) Wait for a message from the netlink socket (neutral
    state).

2) When a process terminates, atopacctd receives
    a message from the netlink socket releasing atopacctd
    from its wait state.

3) Read the new process accounting record(s) from the pacct_source file and write them to the shadow file if at least one atop process is active (which is the case here).

4) Continue with step 1.

The system call trace shows that atopacctd does NOT receive
a message from the netlink socket at the moment that atop
forks/terminates a child process. It remains pending in step 1.
Therefore atop thinks that process accounting is not activated.

So the question is why atopacctd is not informed by the kernel
via the netlink socket about the termination of a process?
And why does it only happen on this test system (I never encounter
these problems on my own system, neither do a lot of other atop users)?

1) Is the netlink interface configured in this kernel?
    It is one of the options you can select while configuring the kernel.

2) Can you anyhow see that the pacct_source ever grows on this system?
If so, the process accounting mechanism and the netlink socket anyhow works.

3) Is the psacct package installed on this system?
Sometimes this package disturbs the process accounting mechanism, because
    it switches on process accounting by itself.

Reply via email to