Hello, I found what happened here. Turns out, when a process is forked and before "setsid" or "setpgid" is called, it shares the parent PGID. In that case the parent is Shepherd, with the PGID 0.
When doing the following actions: * stop guix-daemon * start guix-daemon * stop guix-daemon * start guix-daemon If the second stop occurs after "fork" has been done, but before "setsid", then "(getpgid)" returns 0. The naive patch attached could fix the situation. WDYT? Mathieu
>From 0e4167251a56d6baa4f51fe72250a6e3bffae8c3 Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe <[email protected]> Date: Wed, 6 May 2020 11:48:26 +0200 Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero. When a process is forked, and before its GID is changed in "exec-command", it will share the parent GID, which is 0 for Shepherd. In that case, use the PID instead of the PGID. * modules/shepherd/service.scm (make-kill-destructor): Handle the case when PGID is zero, between the process fork and exec. --- modules/shepherd/service.scm | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index 8604d2f..258992c 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -5,6 +5,7 @@ ;; Copyright (C) 2016 Alex Kost <[email protected]> ;; Copyright (C) 2018 Carlo Zancanaro <[email protected]> ;; Copyright (C) 2019 Ricardo Wurmus <[email protected]> +;; Copyright (C) 2020 Mathieu Othacehe <[email protected]> ;; ;; This file is part of the GNU Shepherd. ;; @@ -957,11 +958,15 @@ start." "Return a procedure that sends SIGNAL to the process group of the PID given as argument, where SIGNAL defaults to `SIGTERM'." (lambda (pid . args) - ;; Kill the whole process group PID belongs to. Don't assume that PID - ;; is a process group ID: that's not the case when using #:pid-file, - ;; where the process group ID is the PID of the process that - ;; "daemonized". - (kill (- (getpgid pid)) signal) + ;; Kill the whole process group PID belongs to. Don't assume that PID is + ;; a process group ID: that's not the case when using #:pid-file, where + ;; the process group ID is the PID of the process that "daemonized". If + ;; this procedure is called, between the process fork and exec, the PGID + ;; will still be zero (the Shepherd PGID). In that case, use the PID. + (let ((pgid (getpgid pid))) + (if pgid + (kill (- pgid) signal) + (kill pid signal))) #f)) ;; Produce a constructor that executes a command. -- 2.26.0
