Re: Kill sends signal to wrong process group with job control enabled

Ben Ashton Sun, 01 Mar 2026 18:43:18 -0800



On 01/03/2026 08:47, Robert Elz wrote:

     Date:        Sun, 1 Mar 2026 06:33:40 -0800
     From:        Ben Ashton <[email protected]>
     Message-ID:  <[email protected]>

   | It's unrelated, but I'm curious what systems don't make the PGID the
   | same as the PID of the leader? I thought that was a POSIX thing.

No.   It is related to how pipelines are forked in the shell,   To make
a new process group:

   | The  FreeBSD syscall manual for "setsid" states: "the setsid() system call
   | returns the value of the process group ID of the new process group,
   | which is the same� as�the� process� ID of the calling process."

that is absolutely correct.   So it all depends how the processes
of a pipeline are forked (for anything other than a pipeline it makes
no difference, the shell forks once, that becomes the process group leader,
and its pid is $!).

It used to be the case once that the shell would fork once for pipelines
as well, and then that process would be both $! and the process group leader,
and because of posix rules, it would also become the rightmost process of
the pipeline, forking children for the other processes for the rest of the
pipe.   But while not impossible, it is difficult, and defeats some
optimisations, to implement things that way on a system supporting the
pipefail option, which is required by POSIX now.

So, instead the parent shell (the one which is going to set $!) forks all
of the children, and adds each of them to the new process group.   To do
that it needs to know the process group ID as each child is created (there's
no sane way to send the info to a process later, only by the memory copy from
the fork()).  So the shell forks once, that child creates a new process
group, the parent knows what that is, as it is the pid of the child it just
created (which fork()) returns.   Then it forks processes for the rest of
the pipeline, those processes know the pgrp id, and join that pgrp
immediately after starting after the fork().

Now it all depends which order the shell creates the pipeline, for which
there are 2 rational choices, left to right, or right to left; which is
simpler depends upon the data structure the shell's parser created when
it parsed the pipeline.   If the pipeline is created right to left, then
the pgrp id is the rightmost process (the way it used to be in times past),
and that also becomes $!.   If the pipeline is created left to right, then
the pgrp is is the pid of the leftmost process, but $! still needs to be the
pid of the rightmost process, hence $! and the pgrp ID are different.

kre

ps: the NetBSD shell has a builtin command which converts between pids and
pgrp ids of children of the shell, so scripts can easily do whatever it is
they really want to achieve, and the shell doesn't need to attempt to guess
what the script is intending  (-$! is certainly not guaranteed to be the
pgrp ID of anything, and if it is, it isn't necessarily what you want, so
when a shell sees that, it has to attempt to intuit what was really meant).

Sorry, no markup survived my cut&paste from an xterm into my MUA in the
following:

      jobid [-g|-j|-p] [job]
             With no flags, print the process identifiers of the processes in
             the job.  If the job argument is omitted, the current job is used.
             Any of the ways to select a job may be used for job, including the
             '%' forms, or the process id of the job leader ('$!' if the job
             was created in the background.)

             If one of the flags is given, then instead of the list of process
             identifiers, the jobid command prints:

             -g     the process group, if one was created for this job, or
                    nothing otherwise (the job is in the same process group as
                    the shell.)

             -j     the job identifier (using '%n' notation, where n is a
                    number) is printed.

             -p     only the process id of the process group leader is printed.

             These flags are mutually exclusive.

             jobid exits with status 2 if there is an argument error, status 1,
             if with -g the job had no separate process group, or with -p there
             is no process group leader (should not happen), and otherwise
             exits with status 0.

So, "jobid -g $!" produces the process group ID from the $! value.

Just for completeness, jobs without process group ids of their own
are ones created when job control is disabled - which is the default
for scripts.

Thank you for the thorough explanation. Given the issue you describewith pipelines, I can see why there are scenarios where the shell mightwant to translate $! so that you can kill the process group of abackgrounded job irrespective of the order in which the pipeline iscomposed.

The issue is that within the subshell, backgrounded commands don'tautomatically get their own process groups (despite job control beingenabled outside of the subshell), however that translation still occurs.Inside the subshell running "sleep 5 &" will just create a child processthat belongs to the process group of the subshell. So that translationwill result in me killing the process group of the subshell. Surely weonly want that translation to occur when the shell created a new processgroup for the command?

Thanks to your explanation I was able to simplify the reproduction quitea lot. Here is what happens with bash and various other shells:

ben@work-laptop:~$ dash -c 'set -m; (setsid sleep 5 & pid=$!; sleep 2;kill -TERM -$pid; echo here)'

here

ben@work-laptop:~$ ash -c 'set -m; (setsid sleep 5 & pid=$!; sleep 2;kill -TERM -$pid; echo here)'

here

ben@work-laptop:~$ zsh -c 'set -m; (setsid sleep 5 & pid=$!; sleep 2;kill -TERM -$pid; echo here)'

here

ben@work-laptop:~$ bash -c 'set -m; (setsid sleep 5 & pid=$!; sleep 2;kill -TERM -$pid; echo here)'

Terminated

As you can see, bash is the only one that translates $! to the PGID ofthe subshell. I confirmed with strace that the other shells ARE killingthe correct process group.

The reason I previously stated that it doesn't matter how I obtain thePGID, is because it's not like the shell is performing any complex logicto figure out that the number was derived from $!. I could have usedsome external tool, or read /proc/PID/stat to determine the processgroup that I wanted to kill. The problem is that the PID effectivelybecomes a magic number that will be translated without my knowledge.

Re: Kill sends signal to wrong process group with job control enabled

Reply via email to