Re: [systemd-devel] setting cpulimit/iolimit on mysql thread not entire process
On Tue, Nov 28, 2023 at 08:35:29AM +0200, Mantas Mikulėnas wrote: > On Tue, Nov 28, 2023 at 8:27 AM jai wrote: > > > I am able to set cpulimit, iolimit, etc for a process using its pid > > through cgroups v2. But for some threads of a single mysql process, how can > > I achieve that? > > > > You cannot; 1) the limits are per-cgroup and the entire service is a single > cgroup; 2) the threads are created by mysqld, not by systemd, and systemd > does not monitor and move service processes across cgroups once the service > is already running; 3) afaik, in cgroups v2 it isn't even allowed for > threads of a single process to straddle multiple cgroups anymore. > > I'm not a DBA but I've heard that one common way to handle this would be to > create a separate MySQL instance (probably on a separate machine, even) > that would replicate all the data, for the heavy users to query. (Or the > other way around, main instance for the heavy updates ⇒ replica for regular > queries.) Generally heavy analytical queries should be on a replica. The reason is that analytical queries are less likely to need the very latest data, whereas transactions probably do. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
Re: [systemd-devel] setting cpulimit/iolimit on mysql thread not entire process
On Tue, Nov 28, 2023 at 8:27 AM jai wrote: > I am able to set cpulimit, iolimit, etc for a process using its pid > through cgroups v2. But for some threads of a single mysql process, how can > I achieve that? > You cannot; 1) the limits are per-cgroup and the entire service is a single cgroup; 2) the threads are created by mysqld, not by systemd, and systemd does not monitor and move service processes across cgroups once the service is already running; 3) afaik, in cgroups v2 it isn't even allowed for threads of a single process to straddle multiple cgroups anymore. I'm not a DBA but I've heard that one common way to handle this would be to create a separate MySQL instance (probably on a separate machine, even) that would replicate all the data, for the heavy users to query. (Or the other way around, main instance for the heavy updates ⇒ replica for regular queries.) -- Mantas Mikulėnas
[systemd-devel] systemd-networkd code design documentation?
Hi, As I start looking at the code, is there any design documentation for developers that describes systemd-networkd? Specifically, I'm looking for an overview of the data-flow when an IPv6 Router Advertisement is received, where it is processed and where it generates the reply. I'm slowly building a picture of this flow, but if someone has already been down this path and is willing to share, then it will save me some time. Thanks, Matt.
Re: [systemd-devel] networkd 249.11 fails to create ip6gre and vti6 tunnels
Kernel and systemd changes aside, I kind of want to say that you need to specify an interface for the link-local endpoint to be bound to – just as with regular sockets. If the tunnel were device-bound and not independent, that would happen by default. It also seems weird that the tunnel has endpoints with different scopes; I think I've seen routers reject such packets with a "Scope Mismatch" error. I would try building systemd from Git source; if I remember correctly, systemd-networkd could be run directly from the build directory, making it possible to `git bisect` down to the change that fixed this. On Mon, Nov 27, 2023, 19:38 Danilo Egea Gondolfo < danilo.egea.gondo...@gmail.com> wrote: > Hello, > > I'm looking for help to understand an issue we are observing on Ubuntu > 22.04. > > networkd is failing with "netdev could not be created: Invalid argument" > when I try to create either an ip6gre or vti6 device. > > We believe this problem started when we pulled this change [1] in to the > kernel 5.15. The problem also happens with the most recent upstream kernel > so it's not an issue introduced by Ubuntu. > > The problem doesn't happen on recent versions of systemd but we'd like to > fix it on systemd 249 (used by Ubuntu 22.04). > > How to reproduce the problem (tested on Ubuntu 22.04 (jammy) with systemd > 249.11-0ubuntu3.11 and kernel 5.15.0-89-generic): > > --- /etc/systemd/network/tun0.netdev --- > [NetDev] > Name=tun0 > Kind=ip6gre > > [Tunnel] > Independent=true > Local=fe80::1 > Remote=2001:dead:beef::2 > -- > > --- /etc/systemd/network/tun0.network --- > [Match] > Name=tun0 > > [Network] > LinkLocalAddressing=ipv6 > ConfigureWithoutCarrier=yes > -- > > After restarting networkd I see this in the logs > tun0: netdev could not be created: Invalid argument > tun0: netdev removed > > If we boot a kernel that doesn't have [1], the interface tun0 is created. > > Here is the full log with debug enabled > https://paste.ubuntu.com/p/dPbPxgRThW/ > > As I said, the problem seems to be fixed already in systemd, but I'm > looking for help to understand what changes fixed it. > The theory is that the netlink attributes used to configure the tunnel > local/remote IPs might be wrong. > > This problem is documented here > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2037667 > > Thanks in advance. > > [1] - > https://github.com/torvalds/linux/commit/b0ad3c179059089d809b477a1d445c1183a7b8fe >
Re: [systemd-devel] How to properly wait for udev?
On Mon, Nov 27, 2023 at 9:29 AM Lennart Poettering wrote: > If they conceptually should be considered block device equivalents, we > might want to extend the udev logic to such UBI devices too. Patches > welcome. Why doesn't udev flock() every device it is probing? Or asked differently, why is this feature opt-in instead of opt-out? -- Thanks, //richard
[systemd-devel] networkd 249.11 fails to create ip6gre and vti6 tunnels
Hello, I'm looking for help to understand an issue we are observing on Ubuntu 22.04. networkd is failing with "netdev could not be created: Invalid argument" when I try to create either an ip6gre or vti6 device. We believe this problem started when we pulled this change [1] in to the kernel 5.15. The problem also happens with the most recent upstream kernel so it's not an issue introduced by Ubuntu. The problem doesn't happen on recent versions of systemd but we'd like to fix it on systemd 249 (used by Ubuntu 22.04). How to reproduce the problem (tested on Ubuntu 22.04 (jammy) with systemd 249.11-0ubuntu3.11 and kernel 5.15.0-89-generic): --- /etc/systemd/network/tun0.netdev --- [NetDev] Name=tun0 Kind=ip6gre [Tunnel] Independent=true Local=fe80::1 Remote=2001:dead:beef::2 -- --- /etc/systemd/network/tun0.network --- [Match] Name=tun0 [Network] LinkLocalAddressing=ipv6 ConfigureWithoutCarrier=yes -- After restarting networkd I see this in the logs tun0: netdev could not be created: Invalid argument tun0: netdev removed If we boot a kernel that doesn't have [1], the interface tun0 is created. Here is the full log with debug enabled https://paste.ubuntu.com/p/dPbPxgRThW/ As I said, the problem seems to be fixed already in systemd, but I'm looking for help to understand what changes fixed it. The theory is that the netlink attributes used to configure the tunnel local/remote IPs might be wrong. This problem is documented here https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2037667 Thanks in advance. [1] - https://github.com/torvalds/linux/commit/b0ad3c179059089d809b477a1d445c1183a7b8fe
Re: [systemd-devel] How to properly wait for udev?
On Mon, Nov 27, 2023 at 9:29 AM Lennart Poettering wrote: > On So, 26.11.23 00:39, Richard Weinberger (richard.weinber...@gmail.com) > wrote: > > > Hello! > > > > After upgrading my main test worker to a recent distribution, the UBI > > test suite [0] fails at various places with -EBUSY. > > The reason is that these tests create and remove UBI volumes rapidly. > > A typical test sequence is as follows: > > 1. creation of /dev/ubi0_0 > > 2. some exclusive operation, such as atomic update or volume resize on > > /dev/ubi0_0 > > 3. removal of /dev/ubi0_0 > > > > Both steps 2 and 3 can fail with -EBUSY because the udev worker still > > holds a file descriptor to /dev/ubi0_0. > > Hmm, I have no experience with UBI, but are you sure we open that? why > would we? are such devices analyzed by blkid? We generally don't open > device nodes unless we have a reason to, such as doing blkid on it or > so. I think it came via commit: dbbf424c8b77 ("rules: ubi mtd - add link to named partitions (#6750)") Here is the bpftrace output of a failed mkvol_basic run. The test created a new volume and tried to delete it via ioctl(). Right after creating the volume, udev started inspecting it and mkvol_basic was unable to delete it because the delete operation needs exclusive ownership. mkvol_basic(530): open() = /dev/ubi0 mkvol_basic(530): ioctl(cmd: 1074032385) (udev-worker)(531): open UBI volume 0 = 0x96644533ac80 mkvol_basic(530): open UBI volume 0 = 0xfff0 mkvol_basic(530): failed ioctl() = -16 (udev-worker)(531): closing UBI volume 0x96644533ac80 > What precisely fails for you? the open()? or some operation on the > opened fd? All of that. I depends on the test. Basically every test assumes that it has the full ownership of a volume it has created. > > > > FWIW, the problem can also get triggered using UBI's shell utilities > > if the system is fast enough, e.g. > > # ubimkvol -N testv -S 50 -n 0 /dev/ubi0 && ubirmvol -n 0 /dev/ubi0 > > Volume ID 0, size 50 LEBs (793600 bytes, 775.0 KiB), LEB size 15872 > > bytes (15.5 KiB), dynamic, name "testv", alignment 1 > > ubirmvol: error!: cannot UBI remove volume > > error 16 (Device or resource busy) > > > > Instead of adding a retry loop around -EBUSY, I believe the best > > solution is to add code to wait for udev. > > For example, having a udev barrier in ubi_mkvol() and ubi_rmvol() [1] > > seems like a good idea to me. > > For block devices we implement this: > > https://systemd.io/BLOCK_DEVICE_LOCKING > > I understand UBI aren't block devices though? Exactly, UBI volumes are character devices, just like MTDs. > If they conceptually should be considered block device equivalents, we > might want to extend the udev logic to such UBI devices too. Patches > welcome. > > We provide "udevadm lock" to lock a block device according to this > scheme from shell scripts. > > > What function from libsystemd do you suggest for waiting until udev is > > done with rule processing? > > My naive approach, using udev_queue_is_empty() and > > sd_device_get_is_initialized(), does not resolve all failures so far. > > Firstly, udev_queue_is_empty() doesn't seem to be exported by > > libsystemd. I have open-coded it as: > > static int udev_queue_is_empty(void) { > >return access("/run/udev/queue", F_OK) < 0 ? > >(errno == ENOENT ? true : -errno) : false; > > } > > This doesn't really work. udev might still process the device in the > background. I see. -- Thanks, //richard
Re: [systemd-devel] WSL Ubuntu creates XDG_RUNTIME_DIR with incorrect permissions
On Mon, Nov 27, 2023 at 1:06 AM Thomas Larsen Wessel wrote: >> >> WSL does not use systemd by default. > > > According to this article, it systemd has been default on WSL Ubuntu since > june 2023. https://learn.microsoft.com/en-us/windows/wsl/systemd > > "Systemd is now the default for the current version of Ubuntu that will be > installed using the wsl --install command default." > > Also when I look in the /var/log/auth.log, there are many lines with systemd, > e.g.: > > Nov 25 22:30:14 ELCON45223 systemd-logind[155]: New session 6 of user velle. > Nov 25 22:30:14 ELCON45223 systemd: pam_unix(systemd-user:session): session > opened for user velle(uid=1000) by (uid=0) > > Could someone please help me understand exactly which part creates this > XDG_RUNTIME_DIR folder? /run/user/$UID for the "console" session (the one you get when starting a WSL instance) is created by WSL before systemd. Adding "ls -l /run/user" to user-runtime-dir@1000.service ExecStartPre: Nov 27 12:34:22 tumbleweed unknown: WSL (2) ERROR: WaitForBootProcess:3237: /sbin/init failed to start within 1 Nov 27 12:34:22 tumbleweed unknown: ms Nov 27 12:34:22 tumbleweed unknown: WSL (2): Creating login session for andrei ... Nov 27 12:34:22 tumbleweed systemd[1]: Created slice User Slice of UID 1000. Nov 27 12:34:22 tumbleweed systemd[1]: Starting User Runtime Directory /run/user/1000... Nov 27 12:34:22 tumbleweed ls[520]: total 0 Nov 27 12:34:22 tumbleweed ls[520]: drwxr-xr-x 4 andrei users 120 Nov 27 12:34 1000 Nov 27 12:34:22 tumbleweed systemd-logind[160]: New session 11 of user andrei. Nov 27 12:34:22 tumbleweed systemd[1]: Finished User Runtime Directory /run/user/1000. So logind invokes user-runtime-dir@1000.service, but it sees the existing directory and does nothing. I would suggest asking this question on WSL support channels. > Is it part of the systemd repo or not? And if the answer is (or may be) > different between Ubuntu and WSL Ubuntu, I would be happy if you share what > you know about any any of those cases :) Right now, I barely know where to > report this issue. > > > On Sun, Nov 26, 2023 at 10:07 AM Andrei Borzenkov wrote: >> >> On 26.11.2023 02:39, Thomas Larsen Wessel wrote: >> > I set up WSL on Windows 10 and created an instance from the default Ubuntu >> > 22.04 image. >> > >> > I ran some (non-GUI) software that somehow relies on Qt, and apparently Qt >> > does some checks on the XDG environment, so I got the following. >> > >> > *Warning: QStandardPaths: wrong permissions on runtime directory >> > /run/user/1000/, 0755 instead of 0700* >> > >> > And yes, all the user folders are set to 755, including much of their >> > content, which violates the XDG Base Directory Specification. (screenshot: >> > https://i.imgur.com/ISn3ebh.png). >> > >> > As far as I can understand, its some part of systemd, that creates this >> > folder. So is this an issue with systemd? >> > >> >> WSL does not use systemd by default. >> >> > The validate_runtime_directory in pam_systemd already does a number of >> > checks on XDG_RUNTIME_DIR. How about also checking if the permissions are >> > correct/valid? >> > >> > Sincerely, Thomas >> > >>
Re: [systemd-devel] Performance issues after migrating to systemd
It would be great to start with `systemd-analyze blame` and `systemd-analyze critical-chain` to see what's going on during boot and point out the time hog(s). On 11/27/23 07:16, hari.prasat...@microchip.com wrote: Hello All, We recently migrated our Yocto project distribution for our embedded Linux based system to Systemd from sysVinit. We have our graphics launcher application known as EGT which is public with it's own repo as below. https://github.com/linux4sam/egt We are facing performance issues after migrating to systemd. We have a set of bench-marking applications whose scores have come down with this migration especially the startup is too slow. We are trying to use profiling tools like prof to see what's happening under the hood, but any pointers on what might be going wrong or areas to check might be useful. The main launcher service is located at https://github.com/linux4sam/meta-atmel/blob/kirkstone/recipes-egt/apps/egt-launcher_1.3.bb Any leads would be helpful. Regards, Hari
Re: [systemd-devel] How to properly wait for udev?
On Mon, Nov 27, 2023 at 10:30 AM Lennart Poettering wrote: > On So, 26.11.23 00:39, Richard Weinberger (richard.weinber...@gmail.com) > wrote: > > > Hello! > > > > After upgrading my main test worker to a recent distribution, the UBI > > test suite [0] fails at various places with -EBUSY. > > The reason is that these tests create and remove UBI volumes rapidly. > > A typical test sequence is as follows: > > 1. creation of /dev/ubi0_0 > > 2. some exclusive operation, such as atomic update or volume resize on > > /dev/ubi0_0 > > 3. removal of /dev/ubi0_0 > > > > Both steps 2 and 3 can fail with -EBUSY because the udev worker still > > holds a file descriptor to /dev/ubi0_0. > > Hmm, I have no experience with UBI, but are you sure we open that? why > would we? are such devices analyzed by blkid? We generally don't open > device nodes unless we have a reason to, such as doing blkid on it or > so. > blkid and 60-persistent-storage indeed analyze ubi devices, it seems. -- Mantas Mikulėnas
Re: [systemd-devel] How to properly wait for udev?
On So, 26.11.23 00:39, Richard Weinberger (richard.weinber...@gmail.com) wrote: > Hello! > > After upgrading my main test worker to a recent distribution, the UBI > test suite [0] fails at various places with -EBUSY. > The reason is that these tests create and remove UBI volumes rapidly. > A typical test sequence is as follows: > 1. creation of /dev/ubi0_0 > 2. some exclusive operation, such as atomic update or volume resize on > /dev/ubi0_0 > 3. removal of /dev/ubi0_0 > > Both steps 2 and 3 can fail with -EBUSY because the udev worker still > holds a file descriptor to /dev/ubi0_0. Hmm, I have no experience with UBI, but are you sure we open that? why would we? are such devices analyzed by blkid? We generally don't open device nodes unless we have a reason to, such as doing blkid on it or so. What precisely fails for you? the open()? or some operation on the opened fd? > > FWIW, the problem can also get triggered using UBI's shell utilities > if the system is fast enough, e.g. > # ubimkvol -N testv -S 50 -n 0 /dev/ubi0 && ubirmvol -n 0 /dev/ubi0 > Volume ID 0, size 50 LEBs (793600 bytes, 775.0 KiB), LEB size 15872 > bytes (15.5 KiB), dynamic, name "testv", alignment 1 > ubirmvol: error!: cannot UBI remove volume > error 16 (Device or resource busy) > > Instead of adding a retry loop around -EBUSY, I believe the best > solution is to add code to wait for udev. > For example, having a udev barrier in ubi_mkvol() and ubi_rmvol() [1] > seems like a good idea to me. For block devices we implement this: https://systemd.io/BLOCK_DEVICE_LOCKING I understand UBI aren't block devices though? If they conceptually should be considered block device equivalents, we might want to extend the udev logic to such UBI devices too. Patches welcome. We provide "udevadm lock" to lock a block device according to this scheme from shell scripts. > What function from libsystemd do you suggest for waiting until udev is > done with rule processing? > My naive approach, using udev_queue_is_empty() and > sd_device_get_is_initialized(), does not resolve all failures so far. > Firstly, udev_queue_is_empty() doesn't seem to be exported by > libsystemd. I have open-coded it as: > static int udev_queue_is_empty(void) { >return access("/run/udev/queue", F_OK) < 0 ? >(errno == ENOENT ? true : -errno) : false; > } This doesn't really work. udev might still process the device in the background. Lennart -- Lennart Poettering, Berlin