Hi Martin,
Hi Marc, hi Gerlof (on CC).
Am Mittwoch, 26. Oktober 2016, 10:44:30 CEST schrieb Marc Haber:
On Wed, Oct 26, 2016 at 10:31:54AM +0200, Martin Steigerwald wrote:
merkaba:~> systemctl status atopacct
● atopacct.service - Atop process accounting daemon
Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
preset: enabled) Active: active (running) since Mi 2016-10-26 10:21:29
CEST; 5min ago>
Docs: man:atopacctd(8)
Process: 31802 ExecStart=/usr/sbin/atopacctd (code=exited,
status=0/SUCCESS)>
Main PID: 31804 (atopacctd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/atopacct.service
└─31804 /usr/sbin/atopacctd
Okt 26 10:21:29 merkaba atopacctd[31804]: Version: 2.2-4 - 2016/10/14
23:17:04 <[email protected]> Okt 26 10:21:29 merkaba
systemd[1]: Starting Atop process accounting daemon... Okt 26 10:21:29
merkaba atopacctd[31804]: accounting to /run/pacct_source Okt 26 10:21:29
merkaba systemd[1]: Started Atop process accounting daemon.
Yet no procacct warning in atop.
I'm running out of ideas. Can you strace atop and see where it fails?
After your answering your question below I think it is still atopacctd having
issues:
But accounting seems to work:
merkaba:~> find /run/pacct_source -ls
1297627 44 -rw------- 1 root root 44928 Okt 26 10:28
/run/pacct_source
How about the files in /run/pacct_shadow.d/? These are the ones an
atop process tries to read.
I have:
-rw-r--r-- 1 root root 21184 Oct 26 00:46 0000000000.paf
-rw-r--r-- 1 root root 7 Oct 26 00:00 current
and the paf file is growing.
There is the issue:
merkaba:~> ls -l /run/pacct_source
-rw------- 1 root root 1174656 Okt 27 22:46 /run/pacct_source
merkaba:~> ls -l /run/pacct_shadow.d
insgesamt 4
-rw-r--r-- 1 root root 0 Okt 27 06:53 0000000000.paf
-rw-r--r-- 1 root root 7 Okt 27 06:53 current
The pacct_source file grows, but the paf file does not.
merkaba:~> ps aux | grep "[a]top"
root 1573 0.0 0.0 4184 84 ? S< 06:53 0:00 /usr/sbin/
atopacctd
root 1904 0.0 0.0 27500 10492 ? S<Ls 06:53 0:26 /usr/bin/atop
-a -R -w /var/log/atop/atop_20161027 600
merkaba:~> LANG=C systemctl status atopacct
* atopacct.service - Atop process accounting daemon
Loaded: loaded (/lib/systemd/system/atopacct.service; enabled; vendor
preset: enabled)
Active: active (running) since Thu 2016-10-27 06:53:31 CEST; 15h ago
Docs: man:atopacctd(8)
Main PID: 1573 (atopacctd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/atopacct.service
`-1573 /usr/sbin/atopacctd
Oct 27 06:53:31 merkaba systemd[1]: Starting Atop process accounting daemon...
Oct 27 06:53:31 merkaba atopacctd[1573]: Version: 2.2-4 - 2016/10/14 23:17:04
<gerlof[…]>
Oct 27 06:53:31 merkaba atopacctd[1573]: accounting to /run/pacct_source
Oct 27 06:53:31 merkaba systemd[1]: Started Atop process accounting daemon.
Hmmm, but atopacctd actually accounts to /run/pacct_source? Is that a
configuration mismatch?
Okay, from strace atop opens the paf file and so… reads from a zero byte file.
25046 open("/run/pacct_shadow.d/current", O_RDONLY) = 3
25046 open("/run/pacct_shadow.d/0000000000.paf", O_RDONLY) = 3
I attach the complete output of strace -e file -f -o /tmp/atop.strace atop.
from manpage of atopacctd:
The directory /run is used as the default topdirectory. Below
this top-directory, the source file pacct_source is created to
which the kernel writes the process accounting records.
Furthermore, the subdirectory pacct_shadow.d is created as a
'container' for the shadow files. Apart from the shadow files,
also the file current is maintained in this subdirectory, con‐
taining the sequence number of the current (newest) shadow file
and the maximum number of records that will be written in each
shadow file.
I don´t fully understand the mechanism atopacctd uses here from above
description, but it for some reason file pacct_source definately grows, but
00000….paf remains empty.
Thanks for the investigations so far!
Atopacctd once creates the file paccct_source and switches on accounting
in the kernel
in order to to write the account records to that file. This mechanism
remains active all the time.
Furthermore, atopacctd once creates the directory pacct_shadow.d with
the file 'current'
(containing the sequence number of the current shadow file) and the
first shadow file 000000000.paf
If no incarnations of atop are running (the current number of atop
incarnations is determined via a semaphore),
the accounting records are still read by atopacctd from the source file,
but not transferred to a shadow file
because nobody will read the shadow file. Only if at least one atop
incarnation is running, account records
will be transferred to the current shadow file.
As such, the situation that the source file grows while the file
000000000.paf remains empty is normal
when no atop incarnations are active or when atopacctd for some reason
assumes that no atop incarnations
are active (that's the question here)....
Unfortunately, the strace of atop only contained the open calls to the
files but no further system calls.
Besides, I think that the strace of atop and the strace of atopacctd
were of two different sessions
(according to the PIDs atop was started before atopacctd was started).
It would be of great help if you could do the following:
1) Stop atopacctd and stop all atop incarnations (clean situation).
2) Start atopacctd again (via systemctl).
3) Start 'strace -tt -f -o atopacct.strace -p <pid_of_atopacctd>
4) Manually start command 'strace -tt -f -o atop.strace atop 1 1 >
/tmp/atop.out'
You could verify if the file /tmp/atop.out really contains the
indication 'no procacct'
to be sure that the problem is caught.
5) Stop the 'strace' command on atopacctd.
Please send me the files atopacct.strace, atop.strace and /tmp/atop.out.
Best regards,
Gerlof
PS. In the meantime I found the reason that atopacctd does not react on
SIGTERM.
I prepared a solution for that issue.
Interesting, atopacctd seems doing nothing, not even responding to being
stopped:
merkaba:~#1> ps aux | grep atopacct
root 1573 0.0 0.0 4184 84 ? S< 06:53 0:00 /usr/sbin/
atopacctd
[…]
merkaba:~> strace -e file -f -p1573
strace: Process 1573 attached
^Cstrace: Process 1573 detached
merkaba:~#5> systemctl stop atopacct
[… hangs …]
It didn´t respond to SIGTERM in a reasonable time. I stopped it with SIGKILL.
Now, I start it on the command line:
merkaba:~> strace -f atopacctd
[…]
25459 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=25461, si_uid=0}
---
25461 recvfrom(5, 0x7ffed921cf60, 16000, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
25461 read(0, <unfinished ...>
25459 rt_sigreturn({mask=[]} <unfinished ...>
25461 <... read resumed> "", 16000) = 0
25461 nanosleep({0, 12000000}, <unfinished ...>
25459 <... rt_sigreturn resumed> ) = -1 EINTR (Interrupted system call)
25459 exit_group(0) = ?
25459 +++ exited with 0 +++
25461 <... nanosleep resumed> NULL) = 0
25461 read(0, "\0\3\4\210\0\0\0\0\0\0\0\0\0\0\0\0sc\0\0qc\0\0Gm\22X
\0\0\200@"..., 16000) = 64
25461 semctl(131073, 0, GETVAL, 0) = 100
25461 recvfrom(5,
Here it hangs, waiting on a socket (also attached).
merkaba:~> pidof atopacctd
25461
merkaba:~> ls -l /proc/25461/fd
insgesamt 0
lr-x------ 1 root root 64 Okt 27 23:12 0 -> /run/pacct_source
l-wx------ 1 root root 64 Okt 27 23:12 1 -> /run/pacct_shadow.d/0000000000.paf
l-wx------ 1 root root 64 Okt 27 23:12 3 -> /run/pacct_shadow.d/current
lrwx------ 1 root root 64 Okt 27 23:10 4 -> socket:[534941]
lrwx------ 1 root root 64 Okt 27 23:10 5 -> socket:[534942]
Still nothing:
merkaba:~> ls -l /run/pacct_shadow.d
insgesamt 4
-rw-r--r-- 1 root root 0 Okt 27 22:59 0000000000.paf
-rw-r--r-- 1 root root 7 Okt 27 22:59 current
Again it doesn´t respond to SIGTERM anymore, just sitting there.
For minutes nothing happened anymore.
This is a ThinkPad T520:
martin@merkaba:~> phoronix-test-suite system-info
Phoronix Test Suite v5.2.1
System Information
Hardware:
Processor: Intel Core i5-2520M @ 3.20GHz (4 Cores), Motherboard: LENOVO
42433WG, Chipset: Intel 2nd Generation Core Family DRAM, Memory: 16384MB,
Disk: 300GB INTEL SSDSA2CW30 + 480GB Crucial_CT480M50, Graphics: Intel 2nd
Generation Core Family IGP, Audio: Conexant CX20590, Monitor: P24T-7 LED,
Network: Intel 82579LM Gigabit Connection + Intel Centrino Advanced-N 6205
Software:
OS: Debian unstable, Kernel: 4.7.0-1-amd64 (x86_64), Desktop: KDE Frameworks
5, Display Server: X Server 1.18.4, Display Driver: modesetting 1.18.4,
OpenGL: 3.3 Mesa 12.0.3, Compiler: GCC 6.2.0 20161019, File-System: btrfs,
Screen Resolution: 3840x1080
Thanks,