Re: [systemd-devel] Systemd units complains about cgroup with 5.15.x kernel

2024-02-01 Thread Thierry Bultel

Dear Lennart,

thanks for the tips.

The distro is buildroot, that compiles systemd with " 
-Ddefault-hierarchy=unified ".

Should I consider that the named kernel has incomplete cgroupsv2 support ?
(How can I check that ?).

I would need to cleanup the log before pasting it in a mail, but what I 
can see
is that some services start (NetworkManager), other not (sshd), without 
being able

to guess what makes them to fail or not.

I will obviously try to recompile with "-Ddefault-hierarchy=hybrid" 
today to check

if that makes a change.

Best regards
Thierry


Le 01/02/2024 à 22:27, Lennart Poettering a écrit :

On Do, 01.02.24 16:30, Thierry Bultel (thierry.bul...@linatsea.fr) wrote:


Hi,

I am using systemd v255,
and currently using a kernel vendor branch :

g...@github.com:varigit/linux-imx.git
lf-5.15.y_var01
imx_v7_defconfig

I had no issue with the older 5.4 kernel.

I have verified that the kernel has the following options:

CONFIG_DEVTMPFS=y
CONFIG_CGROUPS=y
CONFIG_INOTIFY_USER=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EPOLL=y
CONFIG_UNIX=y
CONFIG_SYSFS=y
CONFIG_PROC_FS=y
CONFIG_FHANDLE=y

CONFIG_NET_NS=y

CONFIG_SYSFS_DEPRECATED is not set

CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y

--->

systemd is failing to start some units:

systemd[1]: wpa_supplicant.service: Failed to create cgroup
/system.slice/wpa_supplicant.service: No such file or directory
and also;
  (agetty)[217]:serial-getty@ttymxc0.service: Failed to attach to cgroup
/system.slice/system-serial\x2dgetty.slice/serial-getty@ttymxc0.service: No
medium found

... and I do not have a serial console.

I am currently digging into systemd code to find out what is possibly wrong
.. but if anyone gets a clue, I would appreciate !

Educated guess, you have no cgroupvs2 or so?

Would make sense to provide logs?, use strace to check what precisely
fails?

Ask you distro for help?

Lennart

--
Lennart Poettering, Berlin


--
Re: test
www.linatsea.fr 
--
www.linatsea.fr

Re: [systemd-devel] Systemd units complains about cgroup with 5.15.x kernel

2024-02-01 Thread Lennart Poettering
On Do, 01.02.24 16:30, Thierry Bultel (thierry.bul...@linatsea.fr) wrote:

> Hi,
>
> I am using systemd v255,
> and currently using a kernel vendor branch :
>
> g...@github.com:varigit/linux-imx.git
> lf-5.15.y_var01
> imx_v7_defconfig
>
> I had no issue with the older 5.4 kernel.
>
> I have verified that the kernel has the following options:
>
> CONFIG_DEVTMPFS=y
> CONFIG_CGROUPS=y
> CONFIG_INOTIFY_USER=y
> CONFIG_SIGNALFD=y
> CONFIG_TIMERFD=y
> CONFIG_EPOLL=y
> CONFIG_UNIX=y
> CONFIG_SYSFS=y
> CONFIG_PROC_FS=y
> CONFIG_FHANDLE=y
>
> CONFIG_NET_NS=y
>
> CONFIG_SYSFS_DEPRECATED is not set
>
> CONFIG_AUTOFS_FS=y
> CONFIG_AUTOFS4_FS=y
> CONFIG_TMPFS_POSIX_ACL=y
> CONFIG_TMPFS_XATTR=y
>
> --->
>
> systemd is failing to start some units:
>
> systemd[1]: wpa_supplicant.service: Failed to create cgroup
> /system.slice/wpa_supplicant.service: No such file or directory
> and also;
>  (agetty)[217]: serial-getty@ttymxc0.service: Failed to attach to cgroup
> /system.slice/system-serial\x2dgetty.slice/serial-getty@ttymxc0.service: No
> medium found
>
> ... and I do not have a serial console.
>
> I am currently digging into systemd code to find out what is possibly wrong
> .. but if anyone gets a clue, I would appreciate !

Educated guess, you have no cgroupvs2 or so?

Would make sense to provide logs?, use strace to check what precisely
fails?

Ask you distro for help?

Lennart

--
Lennart Poettering, Berlin


[systemd-devel] Systemd units complains about cgroup with 5.15.x kernel

2024-02-01 Thread Thierry Bultel

Hi,

I am using systemd v255,
and currently using a kernel vendor branch :

g...@github.com:varigit/linux-imx.git
lf-5.15.y_var01
imx_v7_defconfig

I had no issue with the older 5.4 kernel.

I have verified that the kernel has the following options:

CONFIG_DEVTMPFS=y
CONFIG_CGROUPS=y
CONFIG_INOTIFY_USER=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EPOLL=y
CONFIG_UNIX=y
CONFIG_SYSFS=y
CONFIG_PROC_FS=y
CONFIG_FHANDLE=y

CONFIG_NET_NS=y

CONFIG_SYSFS_DEPRECATED is not set

CONFIG_AUTOFS_FS=y
CONFIG_AUTOFS4_FS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y

--->

systemd is failing to start some units:

systemd[1]: wpa_supplicant.service: Failed to create cgroup 
/system.slice/wpa_supplicant.service: No such file or directory

and also;
 (agetty)[217]: serial-getty@ttymxc0.service: Failed to attach to 
cgroup 
/system.slice/system-serial\x2dgetty.slice/serial-getty@ttymxc0.service: 
No medium found


... and I do not have a serial console.

I am currently digging into systemd code to find out what is possibly 
wrong .. but if anyone gets a clue, I would appreciate !


Thanks !
Thierry
--
Re: test



Re: [systemd-devel] Empty journal files consume space

2024-02-01 Thread Steve Traylen



On 01/02/2024 14:48, Steve Traylen wrote:

On 01/02/2024 13:45, Andrei Borzenkov wrote:

On Thu, Feb 1, 2024 at 3:25 PM Steve Traylen  
wrote:

Hi,

I'm trying to understand why I am only retaining just a couple of days
of logs when I would like to have more.

The system journalctl head of the logs is only  today:
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: Data hash table
of /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal has
a fill level at 75.0 (174765 of 233016 items, 58720256 file size, 335
bytes per hash table item), suggesting rotation.
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]:
/var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal:
Journal header limits reached or header out-of-date, rotating.


# journalctl --disk-usage
Archived and active journals take up 8.1G in the file system.

Reality is  system journal is tiny:

# du -sh system.journal
17M system.journal

However we do have many

# ls -l user-*journal | wc -l
1044

and indeed

# du -sh /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
8.2G    /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0

The vast majority of these user journals are empty and offline

# file user-*journal | awk '{print $4, $5}' | sort | uniq -c
  940 empty, offline
  102 offline
  2 online


These user journals are all 8.0M is size

So I think I have two questions:

1) Why am I loosing old logs sooner than I would like - what limit is "
fill level at 75.0 (174765 of 233016 items"

You did not provide any evidence that logs are lost. Archived
(offline) logs are processed and searched by journalctl so the oldest
available log is the oldest archive file, not the current online file.

The limit is the fill grade of the hash table in the individual log
file. It is hard coded and unrelated to the limits configured in the
journald.conf. It may affect how long logs are kept if you configured
retention by the number of log files.

Thanks for reply.

There are no archive files I believe:

# ls /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/*system*
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/system.journal

The archive files would be alongside the live file I believe.

Just tried an explicit " journalctl --rotate" which logs:

Feb 01 14:36:33 nodeX.example.ch systemd-journald[658]: System Journal 
(/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e) is 8.0G, max 3.0G, 
0B free.
Feb 01 14:36:40 nodeX.example.ch systemd-journald[658]: Received 
client request to rotate journal, rotating.
Feb 01 14:36:40nodeX.example.ch systemd-journald[658]: Deleted empty 
archived journal 
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1234@537a18390e124dd6b4cf41a69ef5780d--.journal 
(3.5M).
Feb 01 14:36:40 lxplus978.cern.ch systemd-journald[658]: Deleted empty 
archived journal 
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1235@d7d23966c1454001a714ee5aef039c60--.journal 
(3.5M).


So now maybe I understand at rotation I am over the configured max of 
3GB so perhaps no archive is generated. Looking at another node with 
fewer number of users having ever logged in I have the archive of
of the system log and a longer history. Those 940 "empty, offline"  
user journals consume the space providing no particular value.


No other indication that rotation may not have worked.


2) Is there a safe mechanism to delete those empty offline user 
journals?



Just delete them.


Wrote a tiny script to delete them:

for FILE in /var/log/journal/$(cat 
/etc/machine-id)/user-+([0-9]*).journal ; do
    if [ "$(file --brief $FILE)" == 'Journal file empty, offline' ] 
; then

    rm -f $FILE
    echo "$(basename $FILE) was empty and offline so removed"
    fi
done

works perfectly - unfortunately about 20 seconds later journald (I 
presume) re-creates them all despite the vast majority

of users having no current processes on the nodes.





Thanks.

Steve.

Version and configuration:

systemd-252-18.el9 - RHEL9 with a configuration of:

[Journal]
Storage = persistent
SplitMode = uid
SystemMaxUse = 3G
SystemKeepFree = 10G
MaxRetentionSec = 1year

# df -h /
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda1    80G   65G   16G  81% /




Re: [systemd-devel] Empty journal files consume space

2024-02-01 Thread Steve Traylen

On 01/02/2024 13:45, Andrei Borzenkov wrote:


On Thu, Feb 1, 2024 at 3:25 PM Steve Traylen  wrote:

Hi,

I'm trying to understand why I am only retaining just a couple of days
of logs when I would like to have more.

The system journalctl head of the logs is only  today:
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: Data hash table
of /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal has
a fill level at 75.0 (174765 of 233016 items, 58720256 file size, 335
bytes per hash table item), suggesting rotation.
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]:
/var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal:
Journal header limits reached or header out-of-date, rotating.


# journalctl --disk-usage
Archived and active journals take up 8.1G in the file system.

Reality is  system journal is tiny:

# du -sh system.journal
17M system.journal

However we do have many

# ls -l user-*journal | wc -l
1044

and indeed

# du -sh /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
8.2G/var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0

The vast majority of these user journals are empty and offline

# file user-*journal | awk '{print $4, $5}' | sort | uniq -c
  940 empty, offline
  102 offline
  2 online


These user journals are all 8.0M is size

So I think I have two questions:

1) Why am I loosing old logs sooner than I would like - what limit is "
fill level at 75.0 (174765 of 233016 items"

You did not provide any evidence that logs are lost. Archived
(offline) logs are processed and searched by journalctl so the oldest
available log is the oldest archive file, not the current online file.

The limit is the fill grade of the hash table in the individual log
file. It is hard coded and unrelated to the limits configured in the
journald.conf. It may affect how long logs are kept if you configured
retention by the number of log files.

Thanks for reply.

There are no archive files I believe:

# ls /var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/*system*
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/system.journal

The archive files would be alongside the live file I believe.

Just tried an explicit " journalctl --rotate" which logs:

Feb 01 14:36:33 nodeX.example.ch systemd-journald[658]: System Journal 
(/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e) is 8.0G, max 3.0G, 
0B free.
Feb 01 14:36:40 nodeX.example.ch systemd-journald[658]: Received client 
request to rotate journal, rotating.
Feb 01 14:36:40nodeX.example.ch systemd-journald[658]: Deleted empty 
archived journal 
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1234@537a18390e124dd6b4cf41a69ef5780d--.journal 
(3.5M).
Feb 01 14:36:40 lxplus978.cern.ch systemd-journald[658]: Deleted empty 
archived journal 
/var/log/journal/514fed82c54d4a89b9f7f8f33eca1c8e/user-1235@d7d23966c1454001a714ee5aef039c60--.journal 
(3.5M).


So now maybe I understand at rotation I am over the configured max of 
3GB so perhaps no archive is generated. Looking at another node with 
fewer number of users having ever logged in I have the archive of
of the system log and a longer history. Those 940 "empty, offline"  user 
journals consume the space providing no particular value.


No other indication that rotation may not have worked.



2) Is there a safe mechanism to delete those empty offline user journals?


Just delete them.


Thanks.

Steve.

Version and configuration:

systemd-252-18.el9 - RHEL9 with a configuration of:

[Journal]
Storage = persistent
SplitMode = uid
SystemMaxUse = 3G
SystemKeepFree = 10G
MaxRetentionSec = 1year

# df -h /
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda180G   65G   16G  81% /




Re: [systemd-devel] Empty journal files consume space

2024-02-01 Thread Andrei Borzenkov
On Thu, Feb 1, 2024 at 3:25 PM Steve Traylen  wrote:
>
> Hi,
>
> I'm trying to understand why I am only retaining just a couple of days
> of logs when I would like to have more.
>
> The system journalctl head of the logs is only  today:
> Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: Data hash table
> of /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal has
> a fill level at 75.0 (174765 of 233016 items, 58720256 file size, 335
> bytes per hash table item), suggesting rotation.
> Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]:
> /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal:
> Journal header limits reached or header out-of-date, rotating.
>
>
> # journalctl --disk-usage
> Archived and active journals take up 8.1G in the file system.
>
> Reality is  system journal is tiny:
>
> # du -sh system.journal
> 17M system.journal
>
> However we do have many
>
> # ls -l user-*journal | wc -l
> 1044
>
> and indeed
>
> # du -sh /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
> 8.2G/var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
>
> The vast majority of these user journals are empty and offline
>
> # file user-*journal | awk '{print $4, $5}' | sort | uniq -c
>  940 empty, offline
>  102 offline
>  2 online
>
>
> These user journals are all 8.0M is size
>
> So I think I have two questions:
>
> 1) Why am I loosing old logs sooner than I would like - what limit is "
> fill level at 75.0 (174765 of 233016 items"

You did not provide any evidence that logs are lost. Archived
(offline) logs are processed and searched by journalctl so the oldest
available log is the oldest archive file, not the current online file.

The limit is the fill grade of the hash table in the individual log
file. It is hard coded and unrelated to the limits configured in the
journald.conf. It may affect how long logs are kept if you configured
retention by the number of log files.

> 2) Is there a safe mechanism to delete those empty offline user journals?
>

Just delete them.

> Thanks.
>
> Steve.
>
> Version and configuration:
>
> systemd-252-18.el9 - RHEL9 with a configuration of:
>
> [Journal]
> Storage = persistent
> SplitMode = uid
> SystemMaxUse = 3G
> SystemKeepFree = 10G
> MaxRetentionSec = 1year
>
> # df -h /
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/vda180G   65G   16G  81% /
>
>


[systemd-devel] Empty journal files consume space

2024-02-01 Thread Steve Traylen

Hi,

I'm trying to understand why I am only retaining just a couple of days 
of logs when I would like to have more.


The system journalctl head of the logs is only  today:
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: Data hash table 
of /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal has 
a fill level at 75.0 (174765 of 233016 items, 58720256 file size, 335 
bytes per hash table item), suggesting rotation.
Feb 01 10:47:14 nodeX.example.ch systemd-journald[722]: 
/var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0/system.journal: 
Journal header limits reached or header out-of-date, rotating.



# journalctl --disk-usage
Archived and active journals take up 8.1G in the file system.

Reality is  system journal is tiny:

# du -sh system.journal
17M system.journal

However we do have many

# ls -l user-*journal | wc -l
1044

and indeed

# du -sh /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0
8.2G    /var/log/journal/c33ef6d0ada04ec4abc79c567a7d94b0

The vast majority of these user journals are empty and offline

# file user-*journal | awk '{print $4, $5}' | sort | uniq -c
    940 empty, offline
    102 offline
    2 online


These user journals are all 8.0M is size

So I think I have two questions:

1) Why am I loosing old logs sooner than I would like - what limit is " 
fill level at 75.0 (174765 of 233016 items"

2) Is there a safe mechanism to delete those empty offline user journals?

Thanks.

Steve.

Version and configuration:

systemd-252-18.el9 - RHEL9 with a configuration of:

[Journal]
Storage = persistent
SplitMode = uid
SystemMaxUse = 3G
SystemKeepFree = 10G
MaxRetentionSec = 1year

# df -h /
Filesystem  Size  Used Avail Use% Mounted on
/dev/vda1    80G   65G   16G  81% /