Hi John,

 ---- On Wed, 10 Jun 2026 01:27:47 +0800  John Ericson <[email protected]> 
wrote --- 
 > 
 > 
 > On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote:
 > > Hi Andy,
 > >
 > > ---- On Tue, 09 Jun 2026 08:01:57 +0800  Andy Lutomirski <[email protected]> 
 > > wrote ---
 > > > [...]
 > > >
 > > > After contemplating this for a bit... why pidfd?  Doesn't a pidfd
 > > > refer to an actual process that is, or at least was, running?  This
 > > > new thing is a process that we are contemplating spawning.  I can
 > > > imagine that basically all pidfd APIs would be a bit confused by the
 > > > nonexistence of the process in question.
 > > >
 > >
 > > Yes, I think that is a real concern.
 > >
 > > In my current local WIP I tried to keep that distinction explicit.
 > > pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd
 > > referring to a process. The builder fd is allocated as an anonymous pidfs
 > > file with builder-specific file operations:
 > >
 > >     file = pidfs_alloc_anon_file("[pidfd_spawn]",
 > >                                  &pidfd_spawn_builder_fops, builder,
 > >                                  O_RDWR);
 > >
 > 
 > What does your builder fd point to, explicitly? For example in my other 
 > reply I
 > talked about how it was "real" process state. In my FreeBSD patch, for 
 > example,
 > I found there was already a status for a process "in exec", and I figured 
 > that
 > was clean to reuse for one of these "embryonic" processes that also hadn't
 > started running. I would reckon that Linux probably has some similar notions.
 > 
 > > and the normal pidfd helpers still reject it because it does not use the
 > > ordinary pidfd file operations:
 > >
 > >     struct pid *pidfd_pid(const struct file *file)
 > >     {
 > >         if (file->f_op != &pidfs_file_operations)
 > >             return ERR_PTR(-EBADF);
 > >         return file_inode(file)->i_private;
 > >     }
 > >
 > > So the current split is:
 > >
 > >     builder_fd = pidfd_spawn_open(...);       /* builder object */
 > >     pidfd_config(builder_fd, ...);
 > >     child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */
 > >
 > > Only the last fd is a normal pidfd for an actual child process. The builder
 > > fd is only accepted by the builder operations.
 > >
 > > This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(),
 > > pidfd_getfd(), poll(), etc. mean before the process exists.
 > 
 > I wouldn't be so sure this is necessary/good. For example, I think it could
 > make sense to wait on a process that has yet to be started; one just waits 
 > for
 > both the process to start and the process to exit. Obviously a blocking 
 > syscall
 > in the thread that is spawning the process is not useful, but the 
 > asynchronous
 > poll variation seems fine.
 > 
 > As long as there is real process state here, it shouldn't be too hard to
 > implement.
 > 
 > > The downside is that it adds a separate open-style entry point and is less
 > > uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched.
 > 
 > I do think there is no point having two file descriptors. The file descriptor
 > that previously referred to the builder/embryonic process then can refer to 
 > the
 > real process, right?
 > 
 > > If people think there is a better way to represent the pre-spawn builder
 > > state, or if the preference is to integrate it directly into pidfd_open()
 > > with an explicit empty/future-pidfd state, I would be happy to discuss 
 > > that.
 > 
 > Hope the above answers your question? I suppose my ideas lean more on the
 > "future" than "empty" side --- there is indeed a thread in the thread group,
 > with real VM/namespace/file descriptor etc. state. Moreover, state gets
 > initialized before the process is started, so the actual start is a pretty
 > lightweight step of just letting the scheduler know the now-ready process can
 > be scheduled. The only thing that distinguishes the embryonic process from a
 > real one is simply that it isn't running --- i.e. isn't (yet) available to be
 > scheduled --- so the pidfds holders are free to poke at its state.
 > 
 > Cheers,
 > 
 > John
 > 

Thanks, this helped a lot. I looked at FreeBSD/OpenBSD/XNU after your
note. FreeBSD has P_INEXEC, OpenBSD has PS_INEXEC, and XNU seems even
closer with P_LINTRANSIT, described as "process in exec or in creation".
Linux does not seem to have a single equivalent today: current->in_execve
is only an LSM hint, while the real synchronization is spread across
exec_update_lock, cred_guard_mutex, and the exec path.

I am switching my local WIP from the two-fd builder model to one fd,
closer to Christian's sketch:

fd = pidfd_open(0, PIDFD_EMPTY);
pidfd_config(fd, ...);
pidfd_spawn_run(fd, ...);

In my current local version, I still use copy_process(), so the fd points
at a real task_struct/pid that is not woken until run. Following
Christian's point that existing APIs can handle this not-yet-running case
with ESRCH, I currently make ordinary pidfd operations that need a real
started process return -ESRCH before start.

I am not sure yet whether Linux should grow a general exec/creation
transition state like that, or whether a narrower future-process
lifecycle is enough for this API. I will think more about that when
working on the pristine process version.

Regards,
Li​


Reply via email to