from:"Michal Koutný"

Re: [systemd-devel] ExecReload and dynamic arguments (start control process with arguments)

2024-04-02 Thread Michal Koutný

Hello.

On Mon, Mar 04, 2024 at 06:47:42PM +0400, Vadim Nevorotin  
wrote:
> One of them - how to support different "reload" modes? We need to start
> some control process in the context of service (like ExecReload does), but
> this process supports some arguments and we need to pass them in time of
> 'systemctl reload' call. Previously we have commands like:
> 
> /etc/init.d/ourservice reload|soft-reload|hard-reload [--someargs]

I assume the command + args must be somehow delivered to the service
process (e.g. written to a socket). (DM)

> All doing different actions, manipulating service main process/workers and
> pidfiles, How can I do it with systemd? Did I miss something in the
> documentation?

It seems to me you're focusing too much on the reload aspect whereas it
is a generic communication with the service (because of the
parametrization).

Simply use the delivery mechanism above (DM) and if it should result in
the replacement of tha mein service process send MAINPID=" via
sd_notify().

HTH,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] Forking service behind socket and service.

2024-04-02 Thread Michal Koutný

Hello.

On Wed, Mar 27, 2024 at 07:35:19AM +, Steve Traylen  
wrote:
> In particular I want the socket to close once the fork happens.
> If the service is Type=forking things do work but socket is persisted
> - that's not great for thing doing the original submission. It expects
> the socket to be short lived.

Why? To only have a single instance of the service?
Could you then do me.socket and me.service with StandardInput=socket?
(I guess not because the legacy won't accept.)

What about .socket:MaxConnection=1 then?

Thanks,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] Handle device node timeout?

2024-01-30 Thread Michal Koutný

On Tue, Jan 16, 2024 at 04:06:46PM +0200, Mikko Rapeli 
 wrote:
> I have services which depend on a specific device node. How can I run
> some recovery actions when the default 90s timeout for finding this
> device is hit?

(Not sure if it is the best practice to do a plain-text fall back in
case crypted device setup fails.)

> OnFailure= doesn't work as the service is not even started.

Compare it to how emergency.target is implemented.
You could hook into primary target's OnFailure= and start another target
with alternative device.

> Fix is to remove Encrypt=tpm2 from systemd-repart config to generate plain
> ext4 rootfs. Running the recovery scripts manually in emergency console 
> works, but I
> can't figure out how to trigger this recovery automatically.

You could let emergency.target pull in your recovery. (But as I pondered
above a separate target may be a better approach than overloading
emergency.)

HTH,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] What are Abandon() and AbandonScope() used for?

2024-01-30 Thread Michal Koutný

Hello.

On Tue, Jan 23, 2024 at 04:40:35PM +0100, Felip Moll  wrote:
> Can somebody give me an insight of what these methods really do?
> The documentation is pretty vague:
> https://www.freedesktop.org/wiki/Software/systemd/dbus/

Those would most probably pair with
org.freedesktop.systemd1.Scope.RequestStop DBus signal -- an organized
way how systemd stops scope.

I.e. there can be a "controller" of the scope that would respond to such a
signal with some graceful termination of the scope. Lifetimes of the
scope and controller may be independent and when controller terminates,
it can mark the scope as abandoned (instead of killing its processes or
disappearing without notice).

That's the theory, practice would be logind~controller of its session
scopes.

> When should I use this? For example I am creating a scope which will
> allocate some pids into their cgroups. The scope is created from a daemon
> run from a service unit. The service will eventually be shut down and I
> want the scope to remain if it has pids in it. Should the service "abandon"
> the scope?

That depends whether you want some extra stopping procedure for your
scope besides regular SIGTERM logic.

HTH,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] umount fails on system with huge (2TiB) buff/cache

2024-01-30 Thread Michal Koutný

Hello.

On Fri, Jan 26, 2024 at 12:13:34PM +, Holger Kiehl  
wrote:
...
> Note it states 'no limit' and one can see after some minutes it says
> it umounted /mnt/u2:
...
> Confused here since it stated on serial console output
> 
>[  OK  ] Unmounted /mnt/u2.

Any chance your mount unit has LazyUnmount=yes?

> The only way I can get the system to reboot properly is when sending the
> following command before doing the reboot:
> 
>echo 1 > /proc/sys/vm/drop_caches

How long does this take BTW? (Around those 10 minutes?)

> Is it possible to tell systemd-shutdown to wait longer or are there
> some other parameters I need to tune?

systemd-shutdown uses sum of values from
/proc/meminfo:{NFS_Unstable,Writeback,Dirty} to determine whether the sync
progresses. Something in block/FS layer may got stuck if it doesn't
apparently decrease.

Regards,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] setting cpulimit/iolimit on mysql thread not entire process

2023-12-20 Thread Michal Koutný

Hello.

On Tue, Nov 28, 2023 at 08:35:29AM +0200, Mantas Mikulėnas  
wrote:
> 1) the limits are per-cgroup and the entire service is a single cgroup;

They could create own service unit for the DB with Delegate=cpu,io and
create a subtree manually.

> 2) the threads are created by mysqld, 

That^^ Picking random threads out of the service and enforcing control
on them without considering their internal dependencies is asking
for^W^W creating hard-to debug troubles.

> 3) afaik, in cgroups v2 it isn't even allowed for threads of a single
> process to straddle multiple cgroups anymore.

It depends on enabled controllers, threaded subtrees for controllers
have thread granularity (cpu does, io not).

> I'm not a DBA but I've heard that one common way to handle this would be to
> create a separate MySQL instance (probably on a separate machine, even)
> that would replicate all the data, for the heavy users to query.

That sounds like much more sensible approach.

SCNR,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] Restart SystemD service when Memory Usage in More than a threshold

2023-09-18 Thread Michal Koutný

Hello Ahmad.

On Sat, Sep 16, 2023 at 09:29:07PM +0600, Ahmad Ismail  
wrote:
> The file in ~/.config/autostart which autostart the service is:
 ^^^

> So, I came up with a systemd service which will restart nemo-desktop when
> memory usage is 100MB.

Is the bug you mention a memory leak? (A restart at arbitrary moment
doesn't sound like the best user experience.)

Or is it another cause? Nevertheless, you may want to report it to
the respective upstream or upgrade to its latest version.

> sudo tee /etc/systemd/system/nemo-desktop-bug-workaround.service << END
^^

> [Service]
> User=ismail
> Group=ismail
...
> ExecStart=/usr/bin/nemo-desktop
...
> What am I doing wrong here?

Maybe you conflate a system service (running with changed User=/Group=)
and a user instance service (spawned under user@$UID.service) and some
necessary settings are missing in the nemo-dektop's environment (I'm not
familiar with that particular program, hence a guess only).

HTH,
Michal

signature.asc
Description: PGP signature

Re: [systemd-devel] oomd wake-up frequency

2023-08-25 Thread Michal Koutný

Hello.

On Tue, Aug 22, 2023 at 01:59:52PM -0700, Christian Hergert 
 wrote:
> The primary thing I see showing up when profiling an idle system is oomd. My
> casual reading through the code would lead me to believe it's waking up a
> CPU every .15 seconds.

That coincides with swap monitoring timer.

> Is there a way we could have this wake up less? My goal here is to iron out
> all the little things which are causing energy drain when idle.

Do you have any "Swap Monitored CGroups:" in output of `oomctl dump`?

I think the loop's event source could be disabled when no cgroups
require swap monitoring [1] (and enabled lazily when such are
configured). 

Not sure whether/how much SWAP_INTERVAL_USEC could be increased to
retain responsiveness.

HTT,
Michal

[1] 
https://github.com/systemd/systemd/blob/1925f829ab17cee7d65cc8c350d8281f8f41588e/src/oom/oomd-manager.c#L375

signature.asc
Description: PGP signature

Re: [systemd-devel] coredumpctl: matching by e.g. env var?

2023-03-24 Thread Michal Koutný

On Wed, Mar 15, 2023 at 09:43:37AM +0100, Stephan Bergmann 
 wrote:
> Any thoughts?

Luca's idea of temporary units may work for your runtime
differentiation.

Although, I was thinking of a another use of such a "tagging" mechanism
-- systemd-coredump could skip processing of certain coredumps to save
(CPU dumping) time and (storage) space, e.g. for test-suites that
massively crash but you are not immediately interested in (all) the
dumps.


Michal


signature.asc
Description: PGP signature

Re: [systemd-devel] Cannot mount /sys/kernel/debug in nspawn container

2023-03-24 Thread Michal Koutný

Hello.

On Sun, Mar 19, 2023 at 07:16:56PM +, Martin  wrote:
> Any idea, what I might missing?

Permissions?
(Incresing logging verbosity may give you some hints. [1][2])

> PS: My ultimate goal is to run bpftrace in the container. Seems to be
> slightly tricky.

I wouldn't call it a _contain_er then. You may find [3] interesting
though.

HTH,
Michal

Runtime switch:
[1] systemd-analyze set-log-level debug
[2] SYSTEMD_LOG_LEVEL=debug systemd-nspawn ...
[3] https://lpc.events/event/16/contributions/1237/


signature.asc
Description: PGP signature

Re: [systemd-devel] Smooth upgrades for socket activated services

2023-03-01 Thread Michal Koutný

Hello Mike.

On Mon, Feb 20, 2023 at 11:05:41AM +0100, Mike Hearn  
wrote:
> 2. Is it possible to run two versions of a service unit at once such
> that the old version finishes handling connections and then shuts
> down, whilst new connections are being handled by the new version?

This is a recurring topic, tracked in [1]. I hope to make some progress
there soon.

Feel free to add your ideas there,
Michal

[1] https://github.com/systemd/systemd/issues/10228


signature.asc
Description: PGP signature

Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Michal Koutný

On Thu, Jan 12, 2023 at 03:31:25PM +, Lewis Gaul  
wrote:
> Could you suggest commands to run to do this?

# systemd-analyze set-log-level debug
# logger MARK-BEGIN
# ...whatever restart commands
# ...wait for the failure
# logger MARK-END
# systemd-analyze set-log-level info
# journalctl -b | sed -n '/MARK-BEGIN/,/MARK-END/p'

> Should we be suspicious of the host systemd version and/or the fact that
> the host is in 'legacy' mode while the container (based on the systemd
> version being higher) is in 'hybrid' mode? Maybe we should try telling the
> container systemd to run in 'legacy' mode somehow?

I'd be wary of the legacy@host and {hybrid,unified}@container combo.
Also the old versions on the host could mean that the cgroup setup may
be buggy.
(I only have capacity to look into the recent code but the debug logs
above may show something obvious.)

Ideally, you should tell both host and container to run in the unified
mode ;-)

Michal


signature.asc
Description: Digital signature

Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Michal Koutný

Hello.

On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul  
wrote:
> I can confirm that the container has permissions since executing a 'mkdir'
> in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the
> container succeeds after the restart, so I have no idea why systemd is not
> creating the 'init.scope/' dir.

It looks like it could also be a race/deferred impact from host's systemd.

> I notice that inside the container's systemd cgroup mount
> 'system.slice/' does exist, but 'user.slice/' also does not (both
> exist on normal boot). Is there any way I can find systemd logs that
> might indicate why the cgroup dir creation is failing?

I'd suggest looking at debug level logs from the hosts systemd around
the time of the container restart.


> I could raise this with the podman team, but it seems more in the systemd
> area given it's a systemd warning and I would expect systemd to be creating
> this cgroup dir?

What is the host's systemd version and cgroup mode
(legacy,hybrid,unified)? (I'm not sure what the distros in your original
message referred to.)


Thanks,
Michal


signature.asc
Description: Digital signature

Re: [systemd-devel] systemctl hangs with 249.7 systemd in yocto Honister release

2023-01-04 Thread Michal Koutný

On Wed, Jan 04, 2023 at 07:13:59PM +0800, Heyi Guo  
wrote:
> Jan 04 16:10:57 ali2600 systemd[1]: Caught , dumped core as pid 7516.
> Jan 04 16:10:57 ali2600 systemd[1]: Freezing execution.
> Jan 04 16:10:57 ali2600 phosphor-dump-manager[7536]: Failed to list units:
> Transport endpoint is not connected
> 
> Is it the reason for systemctl fails to work? For the log says "systemd
> freezing execution".

Yes, see the line above, there's SIGSEGV in PID 1.
(Given the other SIGSEGVs, it looks like a common cause across different
processes, e.g. screwed libc update or similar.)

Also, based on the same line, you may be able to extract the coredump
from /var/lib/systemd/coredump (depends on coredump.conf:Storage=) and
figure out more.

Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] systemctl hangs with 249.7 systemd in yocto Honister release

2023-01-04 Thread Michal Koutný

On Wed, Jan 04, 2023 at 04:51:22PM +0800, Heyi Guo  
wrote:
> The issue happened again, but the /proc/1/stack and
> /proc/$pid_of_dbus-broker/stack are both empty on our platform.

(You reported previously the version was v249 (which is behind the last
two upstream versions, so it may be a good idea to raise the issue with
your distro.))

> I checked kernel config and confirmed that  CONFIG_STACKTRACE is enabled:
> 
> zcat /proc/config.gz | grep CONFIG_STACKTRACE
> CONFIG_STACKTRACE_SUPPORT=y
> # CONFIG_STACKTRACE_BUILD_ID is not set
> CONFIG_STACKTRACE=y
> 
> Is there any other config that is missing?

I don't think so (the file wouldn't be present otherwise).

If there are no kernel stacks, the tasks execute in userspace and given
the indefinite stuckage, they're likely looping somewhere (or you must
have been unlucky to miss a syscall), which should manifest in their CPU
consumption.

The userspace stack may be of interest then, e.g.
`gdb -ex "bt" --batch -p 1`

(for PID 1 and debuginfo for involved binaries must be present to obtain
useful info).

Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] systemd-cgtop doesn't show Input/Output

2022-12-05 Thread Michal Koutný

Hello.

On Mon, Dec 05, 2022 at 09:38:18AM +0300, Vladimir Mokrozub 
 wrote:
> $ systemctl --version
> systemd 245 (245.4-4ubuntu3.19)
> +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP
> +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN
> +PCRE2 default-hierarchy=hybrid
   ^^
Unless you override this on kernel cmdline, it means (blk)io controller
is in v1 mode.

> systemd-cgtop always has "-" in both Input/s and Output/s columns. There
> are no spikes, even under a high disk load.
> I was testing it with "dd if=/dev/sda of=/dev/null". Here's the output:
> 
> Control GroupTasks   %CPU   Memory  Input/s Output/s
> /  214 101.5   3.7G   --
> user.slice   15   99.6   2.9G   --
> system.slice  97 0.4 95.4M   --

1) It won't have proper hierarchical behavior (thus no values for .slice
   units, cgtop defaults to depth of 3 thus you may not see the active
   leaves),
2) it won't charge writeback IO properly (just FYI, it's not relevant
   to your example).

If you can, I'd suggest you to switch to the unified mode if you want
hierarchical IO accounting.

HTH,
Michal


signature.asc
Description: Digital signature

Re: [systemd-devel] Issues with parallelised early boot

2022-12-01 Thread Michal Koutný

Hello Naïm.

On Sat, Nov 26, 2022 at 01:59:53PM +0100, Naïm Favier  
wrote:
> When using systemd as PID 1 in the initrd, there is no sequencing between 
> loading kernel modules
> (systemd-modules-load.service) and starting udev (systemd-udevd.service).
> I load my graphics driver (amdgpu) with systemd-modules-load, which takes 
> about three seconds,
> so it finishes loading after udev has started and picked up the initial 
> events, and while the
> LUKS passphrase prompt is waiting for my input.

Perhaps a slightly different angle -- is the graphics driver necessary
to mount the main root FS? (IIUC, you can enter the passphrase even
without it? Then you could build a smaller initrd and load the driver
later (when visual artifacts won't be hopefully distracting))

Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] Error while trying to boot kernel

2022-12-01 Thread Michal Koutný

Hi.

On Sat, Nov 26, 2022 at 11:37:09PM +0200, Nikolay Borisov 
 wrote:
> I'm booting it inside qemu but I use ubuntu to create the initrd, as can be 
> seen from the
> attached log everything works up until switch root has to happen and then I 
> get a
> SIGTERM and booting dies.

Why do you conclude it's SIGTERM?

> [9.047894] Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x7f00

This looks loke PID1 exited on its own volition with exit value 127.
I am not able to attribute it to anything in systemd.

Is the re-execed binary (/[usr/]lib/systemd) systemd at all? (Perhaps it
failed to find an executable at some stage, hence 127)

HTH,
Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] systemctl hangs with 249.7 systemd in yocto Honister release

2022-12-01 Thread Michal Koutný

Hello Heyi.

On Tue, Nov 29, 2022 at 12:44:12PM +0800, Heyi Guo  
wrote:
> Is there any known issue which will cause this problem? Or do you have any
> suggestion on how to debug?

As written in the report, it looks like dbus-daemon or PID1 itself not
responding. Some insights may be obtained by looking at /proc/1/stack
and /proc/$pid_of_dbus/stack (as root).

HTH,
Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] systemctl, unclear error msg/warning, "Refusing to accept PID outside of service control group..."

2022-11-03 Thread Michal Koutný

Hi.

On Thu, Oct 27, 2022 at 10:46:38PM +0200, rop  wrote:
>"New main PID x does not belong to service, and PID file is not
> owned by root. Refusing."
> And when trying to examine the pid-file, it wasn't even created.

That'd suggest that the service has misconfigured PIDFile= path (not
pointint to where the daemon actually writes its pid).

> Where can I find an explanation of these messages?
> 
> What exactly is deemed "unsafe"?

Perhaps,
https://github.com/systemd/systemd/issues/8085#issuecomment-363008993

> And what to do about it?

Update the application/its unit file :-p

Michal



signature.asc
Description: Digital signature

Re: [systemd-devel] Best practise for creating sockets without a corresponding service

2022-11-03 Thread Michal Koutný

Hello.

On Fri, Oct 28, 2022 at 12:39:01PM +0200, Simon Mullis  
wrote:
> Step 0
> - service_data_gen => creates N outputs
> 
> Step 1
> - service1@.service => N instances are created but don't actually need
> to do anything.
> - service1@.socket => N sockets are created which are the target FIFOs
> for the output of - service_data_gen above.

What processes the data flowing into step1@.socket (==service1@.socket)?

> I don't need any service to actually run in step1, I would like
> systemd to manage the sockets and the dependencies (as it is for the
> rest of the chain).

So why don't you just shift the rest of the pipeline/numbering?
step1@.socket would trigger step1@.service and it'd do job that your
current step2@.service does.

And you'd initiate your pipeline with

eval systemctl start step1@{1..${cpu_cores}}.socket

and trigger by writes into the sockets from the single
service_data_gen.

> What is the best practise for an ExecStart= entry to act as a dummy,
> where no service is actually required?  At the moment I am using:
> 
> ExecStart=/usr/bin/sh -c "sleep infininty" in the service template for
> service1@.service

ExecStart=/bin/true
RemainAfterExit=true

is slightly better in terms of system resource use.

> I think the crux of this is entirely related to the use of instance
> templates and linking one unconnected single parent service to many
> child services (and sockets).

FTR, not related to templates in particular, as systemd.socket(5) says:
"For each socket unit, a matching service unit must exist".

HTH,
Michal

signature.asc
Description: Digital signature

Re: [systemd-devel] Q: A way to log peak memory footprint of deactivating units?

2022-08-18 Thread Michal Koutný

Hello.

On Sat, Aug 13, 2022 at 04:39:50AM -0500, Russell Haley 
 wrote:
> Since systemd logs the total CPU time used when the unit deactivates, I
> wonder if there's a way to make it log the peak memory footprint too,
> kind of like the time command's "maxresident". The unit does have
> MemoryAccounting=yes.
> 
> It turned out that it was simple to write a shell script loop to wait
> until packagekit was activated and sample the memory usage before the
> timeout expired. (About 228 MiB.)  However, I am still interested to
> know if there might be a better/more general method.
> 
> P.S. Since I had to re-send this mail with the correct From: address, I
> looked into it and apparently kernel 5.19 added a memory.peak to cgroups
> v2, so I think it very recently become possible to have an elegant
> implementation of this.

You need the kernel with memory.peak (sampling is unrealiable) and then
quick n' dirty

> MemoryAccounting=yes
> ExecStopPost=/usr/bin/cat /sys/fs/cgroup/system.slice/%n/memory.peak

Output goes to journald by default:
> Aug 19 00:13:29 machine cat[7433]: 2924544

If you wanted to have this reported by default, PRs for memory.peak
support are welcomed :-)


Michal

Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-22 Thread Michal Koutný

On Fri, Jul 22, 2022 at 06:14:11PM +0530, Ani A  wrote:
> Found the issue, posting here to close this thread (and possibly help
> someone who might land in this situation!)

Thanks for sharing.

> The daemon which had issues with rate-limit, was invoking some
> `systemctl stop/start `
>  commands in its initialization! (probably this has some unwanted side 
> effects?)

Timing comes to my mind that could affect that.

> If I eliminate that, then the rate-limit on the main daemon works fine! :)

Yeah, better use explicit dependencies (Wants=/After=) instead of such a
call-back.

Michal

Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-14 Thread Michal Koutný

Hello.

On Thu, Jul 14, 2022 at 09:29:37PM +0530, Ani A  wrote:
> StartLimitIntervalUSec=5min 20s   
> StartLimitBurst=5
> StartLimitAction=none
> 
> The time is sufficient for 5 restarts, but still daemon keeps restarting!
> 
> Scheduled restart job, restart counter is at 6

If the 5 restarts fit into the 320 seconds, then the start rate limit
won't be active. You write it's sufficient so that sounds to me that
your rate limit is too high to affect real service. Therefore, I'd
suggest decrasing StartLimitBurst= or prolonging StartLimitIntervalSec=
(so that limit rate is _lower_ than pathologic fatal restart rate).

> Also, how to get rid of this:
> 
>Unknown serialization key: ref-gid
> 
> ?

The upstream is typically concerned about last two systemd versions, so
unless this happens with v251 or v250, I don't know, I'm sorry.

Michal

Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-12 Thread Michal Koutný

On Tue, Jul 12, 2022 at 03:36:55PM +0530, Ani A  wrote:
> Demo services work fine, the actual service is quite heavy and takes
> time to startup.
> 
> > you may not reach the sufficient fail rate for start limit to kick
> I didn't get this part.

I meant that your values might have corresponded to too high (re)start
rate and the real service is slower, i.e. below that limit.

> Say the daemon takes 60s to startup and crash and I set the
> StartLimitIntervalSec=320 This should be sufficient time for 5
> restarts (?)

That gives roughly 320s/5 ~ 64s per (re)start. So I'd say this is
borderline, whether the limit throttles the service starts or not.

You can try whether rate limit works for your real service by setting
some very long StartLimitIntervalSec= (and then calibrating more
precisely).

systemctl show $UNIT | grep -E 
"StartLimit.*|InactiveExitTimestamp|ActiveEnterTimestamp"

May give sou some insight into the timings (but internal ratelimiting
parameters are not available).

> Thanks, I didn't know about systemd-coredump, do I have to install
> this separately?
> I do not see coredump.conf or systemd-coredump service running on my host!
> (Ubuntu 18.04)

Not sure about that distro (and that age). You will ultimetely know if
coredump is configured by reading 
/proc/sys/kernel/core_pattern

> Also, I would be more interested to get the rate-limiting to work
> rather than daemon respawning indefinitely.

Fair enough (just wanted to point out that start limiting won't prevent
coredump size accumulation).

Michal

Re: [systemd-devel] Regarding service rate limiting (systemd 237)

2022-07-12 Thread Michal Koutný

Hi.

On Mon, Jul 11, 2022 at 06:26:44PM +0530, Ani A  wrote:
> but somehow only with the services that I am trying to rate-limit
> (C,unix daemons), it doesn't work! :(

Does your service crash later than the demo service terminates?
(I.e. you may not reach the sufficient fail rate for start limit to kick
in.)

> I just want to make sure that the disk is not filled with core files
> (the daemon dumps pretty huge core files), hence [trying] to
> limit it to 5 restarts, but it keeps restarting forever :(

I may suggest you to use systemd-coredump and e.g. MaxUse= (see
coredump.conf).

Also note that restart limiting would only limit the increase of data
consumption due to core file accumulation but its total size would be
unbound (without a removal process).

HTH,
Michal

Re: [systemd-devel] Unable to check 'effective' cgroup limits

2022-06-09 Thread Michal Koutný

Hello.

On Thu, Jun 09, 2022 at 11:40:02AM +0100, Lewis Gaul  
wrote:
> [Disclaimer: cross posting from
> https://github.com/containers/podman/discussions/14538]
> 
> Apologies that this is more of a Linux cgroup question than specific to
> systemd, but I was wondering if someone here might be able to enlighten
> me...

Yes, this is most suitable for cgro...@vger.kernel.org. (Feel free to
continue there.)

> Two questions:
> 
>- Why on cgroups v1 do the cpuset controller's
>cpuset.effective_{cpus,mems} seem to simply not work?

It's how it eveolved and instead of changing the accustomed behavior,
there's whole different v2.

> Didn't expect this to fail - shouldn't it automatically impose a stricter
> limit on any child cgroups? Do I need to manually update all child cgroups
> first?

The v1 API simply doesn't implement the hierarchical configuration well
(such that ancestors can always override descendants).

> But can't relax the child's cgroup restriction (i.e. need awareness of CPU
> restrictions already imposed above - how are you supposed to check this in
> a private cgroup namespace?).

Binary^WExhaustive search?

> Memory/Hugetlb effective limits
> [...]
> There is a memory.limit_in_bytes file, but no
> memory.effective_limit_in_bytes to reflect parent cgroup restrictions.
> 
> Similarly on cgroups v2:
> [...]
> I guess you could traverse up the cgroup hierarchy to find the smallest
> limit being imposed... But this isn't possible inside a private cgroup
> namespace. Is there any way to find the actual cgroup limit imposed?

I've been actually pondering with .effective analogues for limits on v2
for this reasons. Short answer is that's not implemented.

More generally -- why would you want to know the inherited limit?
(For regular memory, there's the idea, that you watch memory.pressure
and adjust your behavior based on that instead of adapting to residue
from memory.max.)

HTH,
Michal

Re: [systemd-devel] cgroupsv2 and realtime processes

2022-06-06 Thread Michal Koutný

On Mon, Jun 06, 2022 at 05:59:32PM +0200, Michał Zegan  
wrote:
> I assume if it would be on it would break any and all realtime
> usage...?

Most likely (you'd not be able either: turn on RT policy, migrate the
process or enable CPU controller, i.e. a step that'd lead to an invalid
state).

I'm curious, what would be your use case for turning RT group
schedulling on?

Thanks,
Michal

Re: [systemd-devel] cgroupsv2 and realtime processes

2022-06-06 Thread Michal Koutný

On Mon, Jun 06, 2022 at 04:54:03PM +0200, Michał Zegan  
wrote:
> this note pointed to in the readme is quite cgroups v1 specific, I believe
> what it describes was true in v1, and v2 does not have any capability to
> control realtime processes in non root cgroups if I read correctly.

Yes. And it extends to v2 too where there are even no userspace knobs to
configure the RT attributes.

Therefore it works [1] with CONFIG_RT_GROUP_SCHED unset since RT
processes remain in the root cgroup (it's an implementation detail you
won't see from /proc/$pid/cgroup, where is still the regular process
membership).
To prevent confusion -- this applies only to processes (threads) with RT
policy and from all other perspectives these processes (threads) are
still in the listed cgroup.

Does that explain what you were after?

Michal

[1] Unless your goal is to control per-cgroup RT attributes. I
understood you just wanted to be able to place RT processes into
non-root cgroups.

Re: [systemd-devel] cgroupsv2 and realtime processes

2022-06-06 Thread Michal Koutný

Hello Michał.

On Sun, Jun 05, 2022 at 03:28:23PM +0200, Michał Zegan  
wrote:
> I have kernel 5.17 on archlinux.

How is your kernel configured wrt CONFIG_RT_GROUP_SCHED?

> Is that still true?

That depends :-)

> Yet, checking /proc/(pid)/cgroup states these processes are not in a root
> cgroup, yet the cpu controller is enabled on the root cgroup
> (/sys/fs/cgroup/cgroup.subtree_control lists "cpu" as one of the controllers
> and I see the interface files in children).
> 
> Can anyone explain the situation?

With v2 and CONFIG_RT_GROUP_SCHED there's no way how to assign realtime
budgets to cgroups and therefore realtime tasks cannot run in them.

With !CONFIG_RT_GROUP_SCHED, there's (internally) only the root cgroup
for realtime tasks and things apparently work.

See also [1].

> 
> The cgroupsv2 documentation states that cgroup cpu controller
> currently does not support realtime processes, so to enable it all
> realtime processes must be moved to root cgroup.

Will you send a docs patch with the CONFIG_RT_GROUP_SCHED reservation?
:-p

HTH,
Michal

[1] 
https://github.com/systemd/systemd/blob/369151c9c73b12fb7a88fc2b558499c2d4832982/README#L140

Re: [systemd-devel] Relationship between cgroup hierarchy and slice names

2022-05-10 Thread Michal Koutný

Hello.

On Tue, May 03, 2022 at 08:16:48PM -0400, Yeongjin Kwon 
 wrote:
> I'm trying to override the parent slice of a certain slice unit so I can
> reorganize the cgroup hierarchy.

I'm wondering is the certain slice or its parent any of the systemd
implicit slices ({user,user-,system,machine}.slice,...)?

Or you just want to override a hierarchy defined by some 3rd party
units (perhaps with a purpose)?

Thanks,
Michal

Re: [systemd-devel] Starting transient services securely from other service without root

2022-04-27 Thread Michal Koutný

Hello Vašek.

On Mon, Apr 25, 2022 at 10:45:34AM +0200, Vašek Šraier  
wrote:
> TL;DR: I want to start transient system services from another system
> service via DBus. All services should have as little privileges as
> possible, definitely not root. How can I do that securely?

PolicyKit popped to my mind with this short description, basically what
you extend later.
(Also I understand the "starter" and "started" are both the same user.)

> Because the workers are slightly different (e.g. command line args) and
> because it could be confusing to admins, we decided to use transient
> services so that the workers can't be started without the master
> process.

Note this may be also capture with scopes (if you decide to track
lifecycle of workers yourself instead of by systemd). But also scopes
within PID1 require privileges, so that just redresses your problem.

> - User sessions
>   The master process and worker processes can also run in a user 
>   session. This directly solves problems with privileges. However, I am
>   not sure if running a user session with the semantics of a system
>   service is possible or a good idea. I also don't know if there is any
>   documentation related to user sessions without physical users.

Do you mean having all your stuff as services of user instance of
systemd?
Or putting them in proper sessions (as PAMName=foo does)?
I assume the former. It sounds also a bit strange (unusual use of user
instance of unusual(?) requirements).
One consequence that I see directly is that any resource assigned via
cgroups would be restricted by the single user instance for the whole
assemply of workers together. (That can be intended or not.)

(For the latter, I wrote it just for completeness, I don't think it'd be
useful in this case.)

> - Use other service managers, not systemd

Or minimize functions of your main process to just process the config
and figure out jobs so that it can run as root with anything "sensitive"
(open to external world) moved to unprivileged workers/helpers.

In the end, I think it goes along axes like:
- Is there any benefit of having the workers in individual systemd
  units? (That suggests just controlling everything by the main process
  (or 3rd party supervisor.)

- Is there any privilege that is actually needed from PID1 or could a
  given user self-serve themselves? (That suggest the user instance
  services below.)

HTH,
Michal

[systemd-devel] Versioning generated files?

2022-03-31 Thread Michal Koutný

Hello.

After adding a new DBus property I noticed failing test check-directives, with
handful of messages like:
> Looks like test/fuzz/fuzz-unit-file/directives.service hasn't been updated

I'm looking into tools/check-directives.sh and there's a function
generate_directives(). It feels like fulfilling this test is unnecessary
bureaucracy.

I don't know how are these fuzz directives files used but I'm positive the
generate function could be called lazily during the process and deliver humans
of refreshing them manually.

Also, what is the reason of the XML comments in man/*.xml like
> 
it just adds boilerplate. A possible list of undocumented items could also be
generated on demand.

Are there any reasons to version (and test-check) generated content?

Thanks,
Michal

Re: [systemd-devel] learning how to run systemd in a container, journal shows errors I would like to understand what they mean and why

2022-03-25 Thread Michal Koutný

Hello Masber.

On Fri, Mar 25, 2022 at 11:52:33AM +, masber masber  
wrote:
> I have a k8s cluster with docker as container runtime and am I trying
> to make systemd to work.
> I read this doc 
> https://developers.redhat.com/blog/2016/09/13/running-systemd-in-a-non-privileged-container#enter_oci_hooks
>  and I have systemd running in a container.

Note the article is almost six years old. Plenty things were implemented
and configs changed since then.

> Mar 25 11:24:31 nid001002-cluster-1 systemd[1]: Failed to reset devices.list 
> on 
> /kubepods/burstable/podcd69d169-d610-4af7-895a-eb86ee74ed49/4caa4403b8b6d263012e95ca51357ab0bb46fb3bc7a23221115d22efb757cc9c/system.slice/etc-resolv.conf.mount:
>  Operation not permitted
> 
> I would like to ask the meaning of this message and how to solve it (if 
> possible)

This message says that the containerized systemd attempts to set some
cgroup attributes (in this case regarding device access rules via
devices controller, DeviceAllow= directive) but it fails.
Effectively it could mean your container failed to made itself more
secure but it should not affect functionality (from what you provided
here).

You say you run this in an unprivileged container, a responsible runtime
would not set up access to v1 controllers (devices is v1 only), so EPERM
is sort of expected. For the unprivileged containers, I'd suggest you
switch the host into unified cgroup mode (and consequently the container
too). That should resolve the reported problem but there may still
something else that breaks your containerized systemd.

HTH,
Michal

Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-16 Thread Michal Koutný

On Wed, Mar 16, 2022 at 05:06:28PM +0100, Lennart Poettering 
 wrote:
> > That owner would be a process -- bang, you created a service with
> > delegation or a scope with "keepalive" process.
> 
> can't parse this.

That was meant as a humorous proof by contradiction that delegation on
slices is unnecessary. Nvm.

> > (The above is slightly misleading) there could be an alternative of
> > something like RemainAfterExit=yes for scopes, i.e. such scopes would
> > not be stopped after last process exiting (but systemd would still be in
> > charge of cleaning the cgroup after explicit stop request and that'd
> > also mark the scope as truly stopped).
> 
> Yeah, I'd be fine with adding RemainAfterExit= to scope units

Felip, I'd happily review such a PR ;-)


> > Such a recycled scope would only be useful via
> > org.freedesktop.systemd1.Manager.AttachProcessesToUnit().
> 
> Well, if delegation is on, then people don#t really have to use our
> API, they can just do that themselves.

True, in the unified mode it should be safe doing manually.
I was worried about migrating e.g. MainPID of a service into this scope
but PID1 should handle that AFAICS. Also since this has to be performed
by the privileged user (scopes are root's), the manual migration works.

Michal

Re: [systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

2022-03-15 Thread Michal Koutný

On Tue, Mar 15, 2022 at 04:35:12PM +0100, Felip Moll  wrote:
> Meaning that it would be great to have a delegated cgroup subtree without
> the need of a service or scope.
> Just an empty subtree.

It looks appealing to add Delegate= directive to slice units.
Firstly, that'd prevent the use of the slice by anything systemd.
Then some notion of owner of that subtree would have to be defined (if
only for cleanup).
That owner would be a process -- bang, you created a service with
delegation or a scope with "keepalive" process.

(The above is slightly misleading) there could be an alternative of
something like RemainAfterExit=yes for scopes, i.e. such scopes would
not be stopped after last process exiting (but systemd would still be in
charge of cleaning the cgroup after explicit stop request and that'd
also mark the scope as truly stopped).
Such a recycled scope would only be useful via
org.freedesktop.systemd1.Manager.AttachProcessesToUnit().

BTW I'm also wondering how do you detect a job finishing in the case
original parent is gone (due to main service restart) and job's main
process reparented?

BTW 2 You didn't like having a scope for each job. Is it because of the
setup time (IOW jobs are short-lived) or persistent scopes overhead (too
many units, PID1 scalability)?

Michal

Re: [systemd-devel] From dbus notification, how to know service entered failed state and will not start without admin action

2022-03-01 Thread Michal Koutný

Hello.

On Sat, Feb 19, 2022 at 09:28:18AM +0530, Prashantkumar dhotre 
 wrote:
> I see that in OnFailure behaviour is changed and these units in OnFailure
> gets triggerd when service is failed-and-will-not-restart-automatically.
> https://lists.freedesktop.org/archives/systemd-devel/2018-June/040879.html
> In such case, does systemd also send dbus signal  whenever it triggers
> OnFailure ?

I don't think so. (Unless you can infer it from the new unit starting
job that is announced on DBus.)

Possibly you may add your cleanup via OnFailure= drop-in?

Regards,
Michal

Re: [systemd-devel] Restarting "onshot" services

2022-03-01 Thread Michal Koutný

Hello.

On Thu, Feb 24, 2022 at 03:35:13PM +0100, Ulrich Windl 
 wrote:
> Is that intentional?

It was, see

https://github.com/systemd/systemd/commit/10e72727ee

(perhaps some other commits too, this was the first that popped up on
me).

> (systemd 228 of SLES12 SP5)

The commmit above is from v244.

Regards,
Michal

Re: [systemd-devel] Automatically moving forked processes in a different cgroup based on children's UID

2022-01-07 Thread Michal Koutný

Hello Wadih.

On Sat, Jan 01, 2022 at 04:41:12PM -0500, Wadih  wrote:
> Is there a way to automatically classify child processes of a process
> in a different cgroup than the spawning process with systemd based on
> the children's new UID? I know apache2-mpm-itk calls setuid() on its
> children, so we would have to somehow hook on that. 

You can summon the whole PAM machinery and include pam_systemd in the
stack which would create a new session scope for the user. (Or do it
yourself from the process via DBus call
org.freedesktop.systemd1.Manager.StartTransientUnit() that gives you
more freedom for that). (Note that to keep the service lifecycle
tracking under the name of apache2.service, the forked children should
not reparent under PID 1 so that service parent can properly track
them.)

> I'd like to have the child processes that apache2-mpm-itk spawns go
> under their respective user, e.g.
> [...]
> system.slice/apache2.service/vhosts/%UID%

That's an alternative of maintaining the (relative) (sub)hierachy
yourself (and it doesn't require special treating wrt apache2.service
lifecycle).
Note that for this cgroup tree you'd need to specify apache2.service
Delegate= directive though.

> I've been able to do this with cgrulesengd and cgconfigparser for 3
> years, it's been rock solid.

I'm glad it work(s|ed) for you. The asynchronous classification via
cgrulesengd is racy and may not be always reliable (wrt resource
control). It's much better to do fork-classify-exec or
fork(CLONE_INTO_CGROUP)-exec synchronously in the migrated task.

> Would the only solution for me to create a daemon which monitors for
> setuid() calls of the parent apache process, and classify the children
> as per the new setuid user? 

I'd discourage you of going the path of cgrulesengd again. (And
cgroupify too :-p)

> Or perhaps, I think root parent processes spawning specific UID
> children is a common security practise, perhaps there should be
> something in systemd out of the box for classifying the children under
> their respective cgroups?

Yes, on the low level it's the StartTransientUnit() DBus call or its
specialized extensions for logind or machinectl.

> If my only solution is to create a daemon which monitors for setuid()
> I'll do it, although I've never done it before, not sure where I'd have
> to start. Any guidance would be great! 

More viable way seems to me to modify the apache2-mpm-itk to put
children into respective cgroups.

HTH,
Michal

Re: [systemd-devel] systemctl start seems to hang with no status

2021-11-04 Thread Michal Koutný

Hello.

On Fri, Oct 22, 2021 at 01:28:28PM +0200, Ulrich Windl 
 wrote:
> Interestingly I wouldn't expect "Reached target System Time Synchronized.".

The deps are a bit backwards (but not the job ordering):

ntp-wait.service
Wants=time-sync.target
Before=time-sync.target

I.e. the job for time-sync.target is pulled in but it doesn't depend on
anything (the target itself has no Wants=) so it just succeeds after
ntp-wait.service job finishes (fails).

> I don't understand: Was the missing ntp.conf file blocking the service for 11
> minutes?

It's the RestartSec=11min in the ntpd.service configuration that delayed
the restart attempt.

Hopefully that explains the behavior you saw.

Michal

Re: [systemd-devel] Concurrent login / daemon-reload produces abandoned sessions

2021-07-12 Thread Michal Koutný

On Fri, Jul 09, 2021 at 11:42:22AM +0200, Nicolas Bock 
 wrote:
> I have already opened an issue [1] but it was closed. Maybe
> we can just re-open it?

FWIW, https://github.com/systemd/systemd/pull/20199

Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Concurrent login / daemon-reload produces abandoned sessions

2021-07-08 Thread Michal Koutný

Hello Nicolas.

On Wed, Jul 07, 2021 at 02:12:51PM +0200, Nicolas Bock 
 wrote:
> Using systemd-248.3-1ubuntu1 on Ubuntu Impish the following
> script produces multiple abandoned sessions:
> 
>   $ for i in {1..100}; do sleep 0.2; ssh localhost sudo systemctl 
> daemon-reload & ssh localhost sleep 1 & done
>   $ sleep 2
>   $ jobs -p | xargs --verbose --no-run-if-empty kill -KILL
>   $ systemctl | grep abandoned
> session-174.scopeloaded active abandoned 
> Session 174 of user ubuntu
> session-175.scopeloaded active abandoned 
> Session 175 of user ubuntu
> session-176.scopeloaded active abandoned 
> Session 176 of user ubuntu
> session-25.scope loaded active abandoned 
> Session 25 of user ubuntu
> 
> I would like to debug this behavior further and understand
> why this is happening but don't know where to look next.

It might be a bit challenging :)

> Is there any information in particular I should look at?

I assume you use hybrid or unified cgroup setup and that the abandoned
scopes are empty (no processes in their cgroups), correct?

My hypothesis is following

// race between scope abandonement, emptiness notification -> abandon comes 
first
manager_reload
  manager_clear_jobs_and_units
unit_release_cgroup
  inotify_rm_watch(u->manager->cgroup_inotify_fd, 
u->cgroup_control_inotify_wd)
[...]
// last process terminates somewhere here but we're not watching emptiness yet
scope_coldplug()
  // scope should be checked for emptiness here

I _think_ this could be fixed with the patch
--- a/src/core/scope.c
+++ b/src/core/scope.c
@@ -243,8 +243,8 @@ static int scope_coldplug(Unit *u) {
 if (r < 0 && r != -EEXIST)
 return r;
 }
-} else
-(void) unit_enqueue_rewatch_pids(u);
+}
+(void) unit_enqueue_rewatch_pids(u);
 }

 bus_scope_track_controller(s);

Can you file a Github issue to track this (and possibly try if this
works for you)?

Thanks,
Michal



signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Restricting swap usage for a process managed via systemd

2021-07-08 Thread Michal Koutný

Hello Debraj.

On Thu, Jul 08, 2021 at 05:10:44PM +0530, Debraj Manna 
 wrote:
> >> Linux vrni-platform 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 
> >> UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> [...]
> GRUB_CMDLINE_LINUX="audit=1 rootdelay=180 nousb net.ifnames=0 biosdevname=0
> fsck.mode=force fsck.repair=yes ipv6.disable=1
> systemd.unified_cgroup_hierarchy=1"
> 
> Even after making these changes MemorySwapMax not taking into effect.

You need to add also swapaccount=1, swap accounting is enabled by
default only since kernel v5.8.

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] How to correctly use memory controls (MemoryLow) on unified hierarchy system?

2021-05-26 Thread Michal Koutný

On Fri, May 21, 2021 at 03:25:05PM +0300, Andrei Borzenkov 
 wrote:
> Is it necessary to explicitly set it on every ancestor?
It depends against what reclaim you want to be protected.

Global memory reclaim (running out of RAM) -> set it on every ancestor.
Cgroup memory reclaim (hitting memory limit of an ancestor cgroup G) ->
set it till G children only.

It's explained (but not merged) with a picture here [1].

The typical case is the former and therefore typically you set
protection on all ancestors.

Michal

[1] https://lore.kernel.org/lkml/20200729140537.13345-2-mkou...@suse.com/



signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] How to correctly use memory controls (MemoryLow) on unified hierarchy system?

2021-05-26 Thread Michal Koutný

On Fri, May 21, 2021 at 08:14:03PM +0300, Andrei Borzenkov 
 wrote:
> That's overkill for my purposes. This is single user system and all I am
> trying to do is to prevent swapping out Wayland composer to avoid
> waiting several minutes to unblank screen. I am fine with setting values
> once.

system.slice:MemoryLow=A
foo.service :MemoryLow=B// e.g. the compositor

A < B
- you get protection of A bytes against global reclaim
- specifically A = 0 turns protection off

A > B
- you get protection of >=B bytes against global reclaim for foo.service
- (A-B) bytes is spread among all children of system.slice (with
  memory_recursiveprot)
- specifically B = 0 means foo.service shares the protection with all
  other services, it's not prioritized

Then there's third relevant value C -- the typical workingset size of
foo.service. You may get away with B < C.

Certainly, you will need to experiment with this to determine good
values that fit your setup.
(I'm not familiar with Wayland but if it critically depends on some
other services, you may need to protect them too.)

GL;HF,
Michal

signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] name=systemd cgroup mounts/hierarchy

2020-11-23 Thread Michal Koutný


On Thu, Nov 19, 2020 at 10:14:18PM +0300, Andrei Enshin  wrote:
> For you it might be interesting in sake of improving robustness of
> systemd in case of such invaders as kubelet+cgroupfs : )
I think the interface is clearly defined in the CGROUP_DELEGATION
document though.
I'm happy if a bug can be found in general. I'm happier when it's a well
defined and reproducible case.

> ## (1) abandoned cgroup ##
> > systemd isn't aware of it and it would clean the hierarchy according to its 
> > configuration
That was related to a controller hierarchy (which I understood was the
k8s issue about).

Below it is a named hierarchy there it's yet different.

> systemd hasn’t deleted the unknown hierarchy, it’s still presented:
> [...]
> cgroup.procs here and in it’s child cgroup 
> 8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15 are empty.
> Seems there are no processes attached to these cgroups. Date of creation is 
> Jul 16-17.
What systemd version is it? What cgroup setup is it (legacy or hybrid)?


> ## (2) mysterious mount of systemd hierarchy ## 
> [...]
>   Seems to be cyclic mount. Questions are who, why and when did the second 
> mysterious mount?
> I have two candidates:
> - runc during container creation;
> - systemd, probably because it was confused by kubelet and it’s unexpected 
> usage of cgroups.
I don't see why/how would systemd (PID 1) do this (not sure about
nspawn). Anyway you can try tracing mounts systemwide (e.g. `perf trace
-a -e syscalls:sys_enter_mount`) to find out who does the mount.

> ## (3) suspected owner of mysterious mount is systemd-nspawn machine 
> ##
> [...]
> Let’s explore cgroups of centos75 machine:
> # ls -lah 
> /sys/fs/cgroup/systemd/machine.slice/systemd-nspawn\@centos75.service/payload/system.slice/
>  | grep sys-fs-cgroup-systemd
> 
> drwxr-xr-x.   2 root root 0 Nov  9 20:07 
> host\x2drootfs-sys-fs-cgroup-systemd-kubepods-burstable-pod7ffde41a\x2dfa85\x2d4b01\x2d8023\x2d69a4e4b50c55-8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15.mount
> 
> drwxr-xr-x.   2 root root 0 Jul 16 08:05 
> host\x2drootfs-sys-fs-cgroup-systemd.mount
> 
> drwxr-xr-x.   2 root root 0 Jul 16 08:05 
> host\x2drootfs-var-lib-machines-centos75-sys-fs-cgroup-systemd.mount
>   There are three interesting cgroups in container. First one seems to be in 
> relation with the abandoned cgroup and mysterious mount on the host.
Note those are cgroups created for .mount units (and under nested
payload's system.slice). It tells that within the container a mount
point at
> host/rootfs/sys/fs/cgroup/systemd/kubepods/burstable/pod7ffde41a/fa85/4b01/8023/69a4e4b50c55/8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15
was visible. It doesn't mean that the mount was done within the
container.

I can't tell why was that, it depends how was systemd-nspawn instructed
to realize mounts for the container.

> Creation date is Nov  9 20:07. I’ve updated kubelet at Nov  8 12:01. 
> Сoincidence?! I don't think so.
Yes, it can be related. For instance:
- The cyclic bind mount happened,
- it's visibility was propagated into the nspawn container 
- and inner systemd created cgroup for the (generated) .mount unit
  (possibly after daemon-reload).

> Q1. Let me ask, what is the meaning of mount inside centos75 container?
> /system.slice/host\x2drootfs-sys-fs-cgroup-systemd-kubepods-burstable-pod7ffde41a\x2dfa85\x2d4b01\x2d8023\x2d69a4e4b50c55-8842def241fac72cb34fdce90297b632f098289270fa92ec04643837f5748c15.mount
> 
> Q2. Why the mount appeared in the container at Nov 9, 20:07 ?
Hopefully, it's answered above.

> # mind-blowing but migh be important note #
> [...]
> The node already seems to have not healthy mounts:
Is there the conflicting cgroup driver used again?

> # cat /proc/self/mountinfo |grep systemd | grep cgr
> 26 25 0:23 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 
> - cgroup cgroup 
> rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 866 865 0:23 / 
> /var/lib/rkt/pods/run/3720606d-535b-4e59-a137-ee00246a20c1/stage1/rootfs/opt/stage2/hyperkube-amd64/rootfs/sys/fs/cgroup/systemd
>  rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup 
> rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 5253 26 0:23 
> /kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10
>  
> /sys/fs/cgroup/systemd/kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10
>  rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup 
> rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 5251 866 0:23 
> /kubepods/burstable/pod64ad01cf-5dd4-4283-abe0-8fb8f3f13dc3/4a81a28292c3250e03c27a7270cdf58a07940e462999ab3e2be51c01b3a6bf10
>  
>

Re: [systemd-devel] name=systemd cgroup mounts/hierarchy

2020-11-19 Thread Michal Koutný

Hi.

On Wed, Nov 18, 2020 at 09:46:03PM +0300, Andrei Enshin  wrote:
> Just out of curiosity, how systemd in particular may be disrupted with
> such record in root of it’s cgroups hierarchy as /kubpods/bla/bla
> during service (de)activation?
> Or how it may disrupt the kubelet or workload running by it?
If processes from kubeletet.service are migrated elsewhere, systemd may
lose ability to associate it with the service (which may or may not be
correct, I didn't check this particular case).

In the opposite direction, if container runtime builds up a hierarchy
for a controller, systemd isn't aware of it and it would clean the
hierarchy according to its configuration (which can, for instance, be no
controllers at all) and happens during unit (de)activation. The
containers can get away with it when there are no unit changes at the
moment but that's not what you want. Furthermore, since cgroup
operations for a unit usually involve family [1], the interference may
happen even when apparently unrelated unit changes. (This applies to the
most common "hybrid" cgroup layout.)

> Seems I missed some technical details how exact it will interfere.
There's the defined interface (delegation or DBus API) and both parties
(systemd, container runtimes) have freedom to implement cgroups as they
wish within these limits.
If they overlap though, you get an undefined behavior in principle.
That's the reason why to stick to this convention.

Michal

[1] This is rather an implementation detail 

https://github.com/systemd/systemd/blob/f56a9cbf9c20cd798258d3db302d51bf21458b38/src/core/cgroup.c#L2326

signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] name=systemd cgroup mounts/hierarchy

2020-11-18 Thread Michal Koutný

Thanks for the details.

On Mon, Nov 16, 2020 at 09:30:20PM +0300, Andrei Enshin  wrote:
> I see the kubelet crash with error: «Failed to start ContainerManager failed 
> to initialize top level QOS containers: root container [kubepods] doesn't 
> exist»
> details:  https://github.com/kubernetes/kubernetes/issues/95488
I skimmed the issue and noticed that your setup uses 'cgroupfs' cgroup
driver. As explained in the other messages in this thread, it conflicts
with systemd operation over the root cgroup tree.

> I can see same two mounts of named systemd hierarchy from shell on the same 
> node, simply by `$ cat /proc/self/mountinfo`
> I think kubelet is running in the «main» mount namespace which has weird 
> named systemd mount.
I assume so as well. It may be a residual inside kubelet context when
environment was prepared for a container spawned from within this
context.

> I would like to reproduce such weird mount to understand the full
> situation and make sure I can avoid it in future.
I'm afraid you may be seeing results of various races between systemd
service (de)activation and container spawnings under the "shared" root
(both of which comprise cgroup creation/removal and migrations).
There's a reason behind the cgroup subtree delegation.

So I'd say there's not much to do from systemd side now.


Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] name=systemd cgroup mounts/hierarchy

2020-11-13 Thread Michal Koutný

Hello.

On Thu, Nov 12, 2020 at 08:05:34PM +0300, Andrei Enshin  wrote:
> There are few nodes after k8s update started to have (maybe it was
> before) a problem with the following mount:
What exact problem do you see?

> It was taken from /proc/self/mountinfo
What was 'self'?

> May I ask, does systemd mount on a fly some hierarchies like this and
> if yes what logic behind it?   
systemd mounts the cgroup hierarchies at boot. What you see is likely a
bind mount of cgroup subtree into a container done by a container
runtime.

Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] [RFE] distinguish reclaimable memory in `systemctl status` output

2020-09-29 Thread Michal Koutný

On Mon, Sep 28, 2020 at 02:29:07PM -0700, Vito Caputo  
wrote:
> Is it possible to either add a reclaimable field the total memory line
> of `systemctl status` output?
> 
> Or perhaps a separate line like Memory-Reclaimable: ?
What would be the use of such an output?

> Is additional kernel memcg support required to make this possible?
I think you can get a reasonable picture by checking memory.stat of
the given memory cgroup, however, I'm not sure it can be losslessly
translated into a single value like that.

Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Per user limit defaults in systemd.conf

2020-09-01 Thread Michal Koutný

Hello.

On Mon, Aug 31, 2020 at 05:34:15PM -0700, Joshua Miller 
 wrote:
> Is there a way to set per-user defaults for values in systemd.conf?
I don't think so. The config values are typically independent of the
running user.

> I'm looking for a way to do what's done via pam_limits per limits.conf
>  (e.g. `username   hardnofile  512`)
You may still hook into the PAM stack if you specify PAMName=... along
with the User= directive (see systemd.exec(5)).

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] No signal sent to stop service

2020-08-13 Thread Michal Koutný

Hello David.

On Tue, Aug 11, 2020 at 02:33:11PM +1200, David Cunningham 
 wrote:
> The problem is most likely with systemd thinking the program is stopped
> because "systemctl status" reports:
> Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Main process
> exited, code=exited, status=1/FAILURE
> Aug 10 03:57:32 myhost systemd[1]: product_routed.service: Failed with
> result 'exit-code'.
This means there is a mismatch between what the service considers its
man PID (17824) and what systemd tracks -- the tracked process
apparently terminated with failure exit code.

> 1:name=systemd:/user.slice/user-0.slice/session-623.scope
> 0::/user.slice/user-0.slice/session-623.scope
This suggests that the alleged main process (from PID file) was migrated
out of the service's cgroup into session scope (pam_systemd, this can
happen when daemon would switch uid calling into PAM, such as with
su(do).) or it was started directly in the user session.

My suggestion is to check whether MainPID (next time please share full
`systemctl status output`) matches the contents of your PID file (while
the service is "stoppable" and afterwards).

Second, it's worth reviewing what happens around the time when the "Main
process exited" message appears (you can increase PID 1 verbosity
`systemd-analyze set-log-level debug` in order to rule out systemd
issue). 

One idea is that someone starts another service instance from their user
session which breaks the original instance and the new one is not
tracked by systemd.

HTH,
Michal

signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Problem understanding output of systemd-cgtop

2020-08-10 Thread Michal Koutný

Hello.

On Mon, Aug 03, 2020 at 10:31:28AM +0200, Ulrich Windl 
 wrote:
> Why is systemd-cgtop outputting much less slices than systemd-cgls
> does? Specifically I don't see the slice for the process I'm examining
> ("system-iotwatch.slice"). systemd-cgls shows it with three services.
systemd-cgls lists (full) hierarchy maintained by systemd for process
tracking.
The available controller hierarchies can be more shallow though as
they're maintained based on configuration needs. And systemd-cgtop shows
information from the controller hierarchies.

> Also, when using "systemd-cgtop --depth 1", I get this output:
> Control GroupTasks   %CPU   Memory  Input/s 
> Output/s
> /-1.1   564.7M-   
>  -
> /init.scope  1  ---   
>  -
> /system.slice   82  ---   
>  -
> /user.slice 18  ---   
>  -
> 
> 
> Does that output mean /init.scope, /system.slice, and /user.slice all
> don't need any CPU and memory, while only / does?
You only see aggregated consumption of everything under the root*.
because apparently no unit specified fine-grained CPU or memory
accounting


> So why does / have CPU and memory usage, while all others don't?
The tasks count under the root is missing since it's not implemented in
v228, the IO may be missing because no IO is taking place at the
moment.

Regards,
Michal

*) That is usage of all units under the root slice + tasks residing in
the root cgroup (typically only kernel threads, these aren't associated
with any unit and their accounting may be special).




signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] No signal sent to stop service

2020-08-10 Thread Michal Koutný

Hi David.

On Thu, Aug 06, 2020 at 01:59:03PM +1200, David Cunningham 
 wrote:
> The systemd file is as below, and we've confirmed that the PIDFile contains
> the correct PID when the stop is attempted. Would anyone have any
> suggestions on how to debug this? Thank you in advance.
Is the given process running under the expected cgroup
(check /proc/$PID/cgroup)?

Note that the default KillMode=control-group would not necessarily kill
the PIDFile process (systemd.kill (5)).

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] [PATCH v4 2/3] nsproxy: attach to namespaces via pidfds

2020-06-24 Thread Michal Koutný

On Wed, Jun 24, 2020 at 01:54:56PM +0200, Christian Brauner 
 wrote:
> Yep, I already have a fix for this in my tree based on a previous
> report from LTP.
Perfect. (Sorry for the noise then.)

Thanks,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] [PATCH v4 2/3] nsproxy: attach to namespaces via pidfds

2020-06-24 Thread Michal Koutný

Hi.

On Tue, May 05, 2020 at 04:04:31PM +0200, Christian Brauner 
 wrote:
> -SYSCALL_DEFINE2(setns, int, fd, int, nstype)
> +SYSCALL_DEFINE2(setns, int, fd, int, flags)
> [...]
> - file = proc_ns_fget(fd);
> - if (IS_ERR(file))
> - return PTR_ERR(file);
> + int err = 0;
>  
> - err = -EINVAL;
> - ns = get_proc_ns(file_inode(file));
> - if (nstype && (ns->ops->type != nstype))
> + file = fget(fd);
> + if (!file)
> + return -EBADF;
> +
> + if (proc_ns_file(file)) {
> + ns = get_proc_ns(file_inode(file));
> + if (flags && (ns->ops->type != flags))
> + err = -EINVAL;
> + flags = ns->ops->type;
> + } else if (pidfd_pid(file)) {
> + err = check_setns_flags(flags);
> + } else {
> + err = -EBADF;
> + }
> + if (err)
>   goto out;
>  
> - err = prepare_nsset(ns->ops->type, );
> + err = prepare_nsset(flags, );
>   if (err)
>   goto out;
This modification changed the returned error when a valid file
descriptor is passed but it doesn't represent a namespace (nor pidfd).
The error is now EBADF although originally and per man page it
was/should be EINVAL.

A change like below would restore it, however, I see it may be less
consistent with other pidfd calls(?), then I'd suggest updating the
manpage to capture this.

--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -531,7 +531,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
} else if (!IS_ERR(pidfd_pid(file))) {
err = check_setns_flags(flags);
} else {
-   err = -EBADF;
+   err = -EINVAL;
}
if (err)
goto out;

I noticed this breaks systemd self tests [1].

Regards,
Michal


[1] 
https://github.com/systemd/systemd/blob/a1ba8c5b71164665ccb53c9cec384e5eef7d3689/src/test/test-seccomp.c#L246


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Grouping services in systemd..

2020-06-04 Thread Michal Koutný

Hi.

(Not sure if it's still pertinent.)

On Tue, Apr 07, 2020 at 10:23:22AM +0530, nitish nagesh 
 wrote:
> In fact i did try a similar approach of assigning CPUShares to a slice.
> Basically i separated these critical services into a new slice &
> assigned a CPUShare=8192. 
> However with this i see it takes more time than before to complete the
> boot.
Note that if you didn't use cpu controller for anything else before, you
introduced whole new grouping of tasks (i.e. imagine all were in -.slice
previously), that may affect timing interactions.
Furthermore, group scheduling works relative to siblings, i.e.
system-netns.slice would be prioritized against siblings in in
system.slice only.
Finally, when prioritizing certain services you may have taken CPU
time from other services that are dependencies (priority inversion).

It makes sense to prioritize services on critical path but if there's
nothing else to prioritize against, that's just the amount of work that
has to be done anyway as others pointed out.

> So by setting it to 8192 am I reducing the CPUShare and hence seeing
> an increase in time?
No, this is a special value that represents unset CPUShare= internally.
Such units would still apply 1024 cpu.shares.

> The DefaultCPUAccounting also seems to be enabled for the system.
Aha, then my point about cpu controller regrouping is moot probably.

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd-user-sessions.service: Failed to create cgroup /system.slice/systemd-user-sessions.service: No such file or directory

2020-06-04 Thread Michal Koutný

Hi.

Is this still relevant?

On Sun, May 10, 2020 at 08:51:09AM -0700, Nebu Pookins  
wrote:
> Specifically, the systemd-user-sessions service is failing with the
> following messages:
The cgroup hierarchy is built in-memory on each boot based on your
configuration. I'm skeptical how the outage with potential file
corruption could cause this error.
Furthermore, failure of systemd-user-sessions.service should fail the
whole boot up transaction.

I think this error message is a red herring resolving the post-outage
issues.

(Is systemd-user-sessions.service the only service that fails like this?
What systemd version is that?
What cgroup setup do you use (e.g. hybrid vs unified)? Are there any
other programs that would modify cgroup hierarchy?)

Regards,
Michal

signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] kernel messages not making it to journal

2020-06-04 Thread Michal Koutný

Hi.

On Mon, Jun 01, 2020 at 07:11:15PM -0600, Chris Murphy 
 wrote:
> But journalctl does not show it at all. Seems like it might be a bug,
> I expect it to be recorded in the journal, not only found in dmesg.
Journald fetches dmesg messages too (see jounrald.conf:ReadKMsg=). It's
not clear whether you run journalctl as root or non-privileged user that
may not have access to the system-wide kernel messages.

If you don't see the messages in journal as root and you can reproduce
it, I suggest you file an issue on Github [1].

HTH,
Michal

[1] https://github.com/systemd/systemd/issues/new?template=Bug_report.md


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] [nspawn] Could you plz explain me, about resources

2020-03-06 Thread Michal Koutný

Hello.

On Sat, Feb 15, 2020 at 07:33:52PM +0300, Хиль  Эдуард  
wrote:
> Hi there! I am new for containers and i try systemd-nspawn . I created my 
> first container with
>  
> MemoryHigh=100M
> MemoryMax=100M
> MemorySwapMax=1M
>  
> but _inside_ container i see all host resources (for example — 1Gb RAM).
How do you check for the resources available inside the container?
Also, do you use the unified hierarchy?

> All i want — see inside containers actual limits for this container as
> done in LXC with lxcfs. Is this possible? And if not, could you please
> explain to me why?
You should see the configured limits on the root cgroup in the
container's namespace. The cgroup limits won't propagate into
systemd-wide interfaces such as /proc/meminfo.

Michal
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Empty /sys/fs/cgroup/cpu directory post reboot

2019-09-26 Thread Michal Koutný

Hello Tarana.

On Wed, Sep 11, 2019 at 12:17:51PM +, " TARANA, YASHASHVI " 
 wrote:
> I noticed that, once,  after reboot, the directory /sys/fs/cgroup/cpu
> was empty.
The directories for indvidual cgroups are only created based on demand
(the directory path suggests you use the legacy or hybrid hierarchy).
Any unit which requires some of the CPU controller functionality will
trigger building of the hierarchy. By contrast, if there is no such
unit, systemd will not bother creating the directories for the CPU
hierarchy.

> However, after rebooting again, it once again had the original
> contents.
It seems that the latter boot must have started a unit that wasn't
active before and that required the CPU controller.

HTH,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] set-property CPUAffinity

2019-09-03 Thread Michal Koutný

Hello Alexey.

On Fri, Aug 30, 2019 at 01:21:50PM +0300, Alexey Perevalov 
 wrote:
> [...] 
> The question is: changing CPUAffinity property (cpuset.cpus) is not yet 
> allowed in systemd API, right? Is it planned?
Note that CPUAffinity= uses the mechanism of sched_setaffinity(2) which
is different from using cpuset controller restrictions (that's also why
you find it in `man systemd.exec` and not it `man
systemd.resource-control`).

IMO, systemd may eventually support the cpuset controller with a
different directive.

> [...] on the RHEL7 both libvirt & kubernetes handle its vm & pods in
> kubepods.slice and machine.slice sub cgroup respectively in
> appropriate cpuset mount point. [...]
The components that do CPU pinning via cpuset controller do that on
their own (relying on no collisions in the cpuset tree).

Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Delegate v1 cgroup controller permissions

2019-07-11 Thread Michal Koutný

On Thu, Jun 20, 2019 at 02:19:34PM +0200, Lennart Poettering 
 wrote:
> Sorry, but there is not, it's not safe, as documented.

The doc [1] says:
> Think twice before delegating cgroup v1 controllers to less privileged
> containers. It’s not safe, you basically allow your containers to
> freeze the system with that and worse.

My search-fu is not strong enough and I'm interested in the details.
What controller settings can have such ramifications on the rest of the
system? 

Thanks,
Michal

[1] https://systemd.io/CGROUP_DELEGATION
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Allocating resource to achieve predictable run times

2019-06-20 Thread Michal Koutný

Hi.

On Mon, Jun 17, 2019 at 02:15:19PM +0100, John Lane  wrote:
> I am trying to meet a requirement to have predictable execution of jobs.
> [...]
> When I say "container" I mean "an environment with reserved resources".
> I have been looking at using cgroups both directly and via systemd.
> [...]
Do you have control over what jobs are you running? Or do you wish to
have the predictable times for any kind of job (i.e. using any
potentially shareable resource)?

> Observations with a simple single-threaded test on one cpu:
> [...] but I cannot get predictable results: that 1 job or n jobs take
> the same amount of time.
Ignoring the last outlier, how do the previous measurements miss your
expectations?

Michal

(P.S. I can't tell from the public archive if I got the thread
completely or if some messages weren't delivered so I may be behind or
missing already presented context.)


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] Password agent for user services

2019-05-13 Thread Michal Koutný

Hello,
I was pondering a user service that would ask for password via the
password agent infrastructure (as there is
systemd-gnome-ask-password-agent it could be quite integrated with the
desktop environment) as an alternative to saving it in (Gnome) keyring.

Naïve experiment with

> [Service]
> ExecStart=/usr/bin/systemd-ask-password "What is your pwd?"

lead to

> May 13 19:49:56 host systemd-ask-password[28844]: Failed to query password: 
> Permission denied

Then I read about the password agent API [1] and realized that poor
agent cannot create the notification file in the watched directory. I
also noticed the auxiliary agent is not spawned for user services [2].

I'm not that familiar with policy-kit, however, IIUC, it is possible to
ask unprivileged systemd-gnome-ask-password-agent to provide a password
for system service. Is that correct?
What would then prohibit making /run/systemd/ask-password world writable
to allow unprivileged users to ask for a password?

(I understand the interface is so crude so that it works at early boot
stages w/out DBus. For the user requests it would perhaps make sense to
make have a parallel DBus API.)

Or is there an alternative approach to query interactively passwords for
user services (e.g. already existing user service that could queried via
DBus)?

Thanks,
Michal


[1] https://www.freedesktop.org/wiki/Software/systemd/PasswordAgents/
[2] 
https://github.com/systemd/systemd/blob/a45ef5070d5875d70e39fc430e82eb26c221ded5/src/systemctl/systemctl.c#L238


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemctl start second.service first.service

2019-01-18 Thread Michal Koutný

Thanks for your exhaustive reply, Jonathon.

On Tue, Jan 08, 2019 at 04:02:47AM +0530, Jonathon Kowalski 
 wrote:
> [...]
> I think systemctl should do something similar to that, internally
> create a transient target unit through manager's bus API, add Wants=
> (which gives it implicit After=) on all unit names passed, and then
> invoke the startup of this target, so that it gets treated as anchor
> job and generates a transaction where all dependencies are ordered
> properly. Hence, I vote for the first option.
I agree, this recycles most of the existing transaction engine logic and
does not bring too many corner cases needing resolution.

> [...] So I vote for both 2 and 1, 2 implemented using the 1st
> approach.
Problem with 2 is that it touches DBus API and hence it should be a well
considered change. I don't think it's good to create such "macro"
functions that would emulate something that could be done by other DBus
methods (create the transient target, add deps for it, start it).

> That said, I do think in practice if units are declared properly dependency
> wise, they should already pull one another in a transaction, which would
> then also take care of ordering deps, and anything that actually needs such
> semantics from systemctl (or through the bus) is already broken.
This is an argument I did not contemplate enough but once you pointed it
out, I cannot agree more :)

Units themselves should specify what they Wants= or Requires= in the
transaction. This may be distributed in another unit, e.g.
- U wants A and B,
- B after A,
- intended usecase is starting U.

The manual invocation of `systemctl start B A` could be then understood
rather as debugging operation.

In this light, it makes a nice sense to me to introduce a new option to
systemctl, that'd make sure a transient target is created and it
aggregates the requested operations.

Management of that unit then can be no special from others because by
passing that unit, user should be expecting such a unit is created.

You've concinced me about variant 1 with the behavior being triggered by
explicit user request.

Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemctl start second.service first.service

2019-01-07 Thread Michal Koutný

(Bringing up an older one.)

On 1/15/18 2:20 AM, 林自均  wrote:
> I've filed https://github.com/systemd/systemd/issues/7877 for this.
There's also accompanying RFE at [1]. I've looked into that and arrived
at design/implementation crossroads. I'd be happy for any ideas/feedback
on that GH issue.

Thanks,
Michal

[1] https://github.com/systemd/systemd/issues/8102




signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Debugging active timers that do not trigger

2018-11-15 Thread Michal Koutný

On 11/8/18 11:46 AM, Andrei Borzenkov wrote:
> It is possible that system never ends booting. Do you have any pending
> jobs (systemctl list-jobs)? What "systemctl is-system-running" says?
I don't think this is the case. The OnBootSec= is taken relatively to
the instant when the kernel started counting time (on Linux implemented
via CLOCK_MONOTONIC).

@Daniel, is it possible there are some daemon-reloads running
concurrently with the timer? More precisely, can it happen the timer
expires exactly when systemd reloads?

Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] /var/log/journal full, journald is not removing journal files

2018-07-12 Thread Michal Koutný

Hi Chris.

On 07/11/2018 09:44 PM, Chris Murphy wrote:
> Somehow journald would not
> delete its own files until I had deleted  a few manually.
Indeed, see man page update [1] added recently for more details. I
assume your space was occupied by active journal files. Do you have any
detailed break down of /var/log/journal contents?

Michal

[1] https://github.com/systemd/systemd/commit/1a0d353b44e



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Create a target unit to start & stop a group of services

2018-02-26 Thread Michal Koutný



On 02/26/2018 11:08 AM, Michal Sekletar wrote:
> Unfortunately, we don't have a dependency (AFAIK) that only propagates
> stop actions.
FTR (not helpful for the original problem), there exists ConsistsOf= as
an inverse of PartOf= dependency. However, it's read only currently (or
strictly speaking, writable through the PartOf= endpoint).

Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Controlling user processes with systemd+cgroups

2018-02-08 Thread Michal Koutný

Hello systemd-devel.

Short summary from the old thread:

On 06.09.15 16:14, Lennart Poettering [1]:
> Ultimately our goal is that you build your tree of slices, and then
> freely attach users, services, containers, VMs to these slices at the
> places you want them. You can already do that nicely for services and
> containers (at least for nspawn containers), but for users this is
> really missing.

The missing piece is thus where to store user->slice mapping (not
necessarily injective as it is now). Then it would be possible to apply
limits _shared in groups_ of users.

Lennart also sketched such information could be in the user database,
although there is no standard way how to obtain that. AFAIK this still
applies and IMO it may take longer to change than comfortable. (Please
enlighten me on whatever I might be missing in this regard.)

I gave a thought to alternatives. They basically rely on GID -- the
information that already can be obtained from a user database in
standard way and it overlaps with most missing use cases.

Variant 1 -- Slice.GroupId= property.

Admins would create unit files for slices specifying this option and
users who are members of any listed groups would have their user
sessions placed into given slice instead of user-$UID.slice.


Variant 2 -- group mode

This would allow admins to switch how user slices are created. By
switching into the group mode (e.g. pam_systemd or logind option) user
sessions would be put into group-$GID-$UID.slice and cgroup
configuration would be then applied to respective group-$GID.slice units.

What are your thoughts on that? Do any other alternatives come to your
mind? Would some of the variants be eventually acceptable to be included?

Thanks,
Michal

[1]
https://lists.freedesktop.org/archives/systemd-devel/2015-September/034131.html



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd-logind failing due to dbus error on org.freedesktop.systemd1

2018-02-02 Thread Michal Koutný



On 02/01/2018 01:18 PM, Colin Guthrie wrote:
> If it's ybbind that's causing issues, then chances are it's related to
> the NSS setup, i.e. /etc/nsswitch.conf and other related config specific
> to Yellow Pages stuff (I forget what they are as it's been > 10years
> since I used it!)
+1, I don't think the cause in systemd/logind itself.

Alternatively, it could be dbus being affected by this NSS setup. You
may monitor dbus communication or dbus strace to figure out why
systemd's calls time out.

Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd-logind failing due to dbus error on org.freedesktop.systemd1

2018-01-31 Thread Michal Koutný

On 01/31/2018 03:55 PM, Gustavo Sousa wrote:
> Unfortunately, no. I didn't actually wait for the auto restart, I
> tried myself with 'systemctl restart systemd-logind'.
It seems like systemd-logind was thus properly connected to dbus and
there may be other issue.

Could you please post logs preceding the snippet you sent previously?
(It seems something must went wrong at/before 8:38:51.) Also since it
happened after reboot and systemd-logind restart didn't help, does it
mean it's reproducible or still present?

Can you see 'org.freedesktop.systemd1' name owned in the `busctl` output?

Does `kill -SIGUSR1 1` help you?

Thanks,
Michal

signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd-logind failing due to dbus error on org.freedesktop.systemd1

2018-01-31 Thread Michal Koutný

Hello Gustavo.

On 01/31/2018 03:36 PM, Gustavo Sousa wrote:
> After a system upgrade, 
Was dbus-daemon restarted as part of the upgrade?

> I've posted the output of 'journalctl -xe' regarding the error
> here: https://pastebin.com/TNmg2z9s 
Was it fixed when systemd-logind was (auto) restarted?

Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Newer videos

2018-01-22 Thread Michal Koutný



On 01/21/2018 03:21 PM, Cecil Westerhof wrote:
> I wanted to dive deeper into systemd. So I sought videos on YouTube. [...]
> But that one is 2½ years old. Is there something more recent?
https://media.ccc.de/c/asg2017

Cheers,
Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Dependencies on DBus activated services during shutdown

2018-01-08 Thread Michal Koutný

On 01/08/2018 05:52 PM, Simon McVittie wrote:
> Does A.service even need B.service during shutdown if it has not
> previously interacted with (=> started) B.service?
That's not correct implication, due to concurrency, B.service may be
stopped before A.service.

> On Mon, 08 Jan 2018 at 16:53:04 +0100, Jérémy Rosen wrote:
>> That means that the only way to fix that without explicitely telling someone
>> about the dependency is to allow dbus to start units while its shutdown is
>> pending [...] this seems to be explicitely forbidden
> 
> I didn't write that special case, but I agree with it. Starting D-Bus
> services while the sword of Damocles is hanging over dbus-daemon's
> head does not sound like a route to guaranteed success. As soon as
> dbus-daemon gets SIGTERM, they'll find that their D-Bus AF_UNIX socket
> is rather less useful than it was a moment ago...
Nice explanation. I forgot to mention that A.service already has
dependency on dbus.service (which is unnecessary by the elegance
criterium). So there's a guarantee two sword will not fall down until
A.service terminates.

> If A.service can be made to shut down correctly without B.service,
> then that seems good in any case. (What happens if B.service crashes?)
The A's best effort would make sense in the crash case, however, I don't
think it's the right place .

> Failing that, if A.service genuinely needs D-Bus during its shutdown,
> it probably also makes sense to serialize it After=dbus.service (not
> just dbus.socket) so that dbus-daemon will be kept alive until A.service
> actually exits.
That's actually a valid idea (see above).

> To be honest that doesn't seem too bad to me.
Yes, the lesser viability of other options I see, the more I like it as
well.

The alternative of changing the condition from
manager_unit_inactive_or_pending to unit_inactive would AFAIU lead to a
conflict with the shutdown target if B.service was reactivated. (And if
not, it could potentially make shutdown transaction infinite.)

Thanks,
Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Dependencies on DBus activated services during shutdown

2018-01-08 Thread Michal Koutný



On 01/08/2018 08:04 PM, Andrei Borzenkov wrote:
> If systemd could infer that A requested B to be started it could also
> add implicit ordering between A and B.
Yes, this would be a way to the complete solution where all dependencies
are tracked (however, I'm not sure it's achievable). ActivationRequest
signal doesn't pass the requester identification and this would probably
track just the dependencies when the B.service isn't active yet.

Michal



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] Dependencies on DBus activated services during shutdown

2018-01-08 Thread Michal Koutný

Hello,
I'd like to ask your opinion on the following situation.

B.service exposes its API through D-Bus. A.service uses this API and
thus it has a dependency on B.service. This is implicit though -- and
we're happy we can rely on D-Bus activation and needn't to list all
dependencies explicitly.

As it comes, A.service needs B.service for proper termination. During
the shutdown transaction there is unspecified ordering of the two (since
the dependency is implicit only) and B.service is stopped before A.service.

A.service would attempt to D-Bus-activate B.service but that is rejected
because dbus-daemon will eventually stop too. Note this doesn't mean
dbus-daemon is already handling SIGTERM, it's because a dbus-daemon stop
job is pending [1]. A.service may thus cannot terminate properly.

I know this could be circumvented by explicitly specifying
After=b.service for the A.service but denies the elegance of the lazy
(implicit) activation.

Are there any better ways how to deal with this?

Thanks,
Michal Koutný

P.S. FTR, in my case A.service=libvirtd.service and
B.service=systemd-machined.service.

[1] https://github.com/systemd/systemd/blob/master/src/core/dbus.c#L169



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Sharing processes across services

2016-09-19 Thread Michal Koutný



On 09/14/2016 01:31 AM, Michal Koutny wrote:
> Currently one PID can belong up to two services
> (manager->watch_pids{1,2}). If more than one why just two? And when can
> such a situation happen? 
I've found origin of this change in [1]. Still, I wonder why it is
"interesting to map a PID to two units at the same time". Is
getty->login the only use case? Wouldn't it make more sense to move the
PID to another unit rather than share it?

Thanks,
Michal

[1]
https://github.com/systemd/systemd/commit/5ba6985b6c8ef85a8bcfeb1b65239c863436e75b



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] Sharing processes across services

2016-09-13 Thread Michal Koutný

Hello,
I wonder why does systemd allow mapping PID->service to be non-unique.
Currently one PID can belong up to two services
(manager->watch_pids{1,2}). If more than one why just two? And when can
such a situation happen? I was able to attain this only by subverting
content of the PID file written by Type=forking services.

Thanks for illumination,
Michal


signature.asc
Description: Digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

80 matches

Mail list logo