On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote:
> Hi Andy,
>
> ---- On Tue, 09 Jun 2026 08:01:57 +0800 Andy Lutomirski <[email protected]>
> wrote ---
> > [...]
> >
> > After contemplating this for a bit... why pidfd? Doesn't a pidfd
> > refer to an actual process that is, or at least was, running? This
> > new thing is a process that we are contemplating spawning. I can
> > imagine that basically all pidfd APIs would be a bit confused by the
> > nonexistence of the process in question.
> >
>
> Yes, I think that is a real concern.
>
> In my current local WIP I tried to keep that distinction explicit.
> pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd
> referring to a process. The builder fd is allocated as an anonymous pidfs
> file with builder-specific file operations:
>
> file = pidfs_alloc_anon_file("[pidfd_spawn]",
> &pidfd_spawn_builder_fops, builder,
> O_RDWR);
>
What does your builder fd point to, explicitly? For example in my other reply I
talked about how it was "real" process state. In my FreeBSD patch, for example,
I found there was already a status for a process "in exec", and I figured that
was clean to reuse for one of these "embryonic" processes that also hadn't
started running. I would reckon that Linux probably has some similar notions.
> and the normal pidfd helpers still reject it because it does not use the
> ordinary pidfd file operations:
>
> struct pid *pidfd_pid(const struct file *file)
> {
> if (file->f_op != &pidfs_file_operations)
> return ERR_PTR(-EBADF);
> return file_inode(file)->i_private;
> }
>
> So the current split is:
>
> builder_fd = pidfd_spawn_open(...); /* builder object */
> pidfd_config(builder_fd, ...);
> child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */
>
> Only the last fd is a normal pidfd for an actual child process. The builder
> fd is only accepted by the builder operations.
>
> This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(),
> pidfd_getfd(), poll(), etc. mean before the process exists.
I wouldn't be so sure this is necessary/good. For example, I think it could
make sense to wait on a process that has yet to be started; one just waits for
both the process to start and the process to exit. Obviously a blocking syscall
in the thread that is spawning the process is not useful, but the asynchronous
poll variation seems fine.
As long as there is real process state here, it shouldn't be too hard to
implement.
> The downside is that it adds a separate open-style entry point and is less
> uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched.
I do think there is no point having two file descriptors. The file descriptor
that previously referred to the builder/embryonic process then can refer to the
real process, right?
> If people think there is a better way to represent the pre-spawn builder
> state, or if the preference is to integrate it directly into pidfd_open()
> with an explicit empty/future-pidfd state, I would be happy to discuss that.
Hope the above answers your question? I suppose my ideas lean more on the
"future" than "empty" side --- there is indeed a thread in the thread group,
with real VM/namespace/file descriptor etc. state. Moreover, state gets
initialized before the process is started, so the actual start is a pretty
lightweight step of just letting the scheduler know the now-ready process can
be scheduled. The only thing that distinguishes the embryonic process from a
real one is simply that it isn't running --- i.e. isn't (yet) available to be
scheduled --- so the pidfds holders are free to poke at its state.
Cheers,
John