Hi John, ---- On Wed, 10 Jun 2026 01:27:47 +0800 John Ericson <[email protected]> wrote --- > > > On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote: > > Hi Andy, > > > > ---- On Tue, 09 Jun 2026 08:01:57 +0800 Andy Lutomirski <[email protected]> > > wrote --- > > > [...] > > > > > > After contemplating this for a bit... why pidfd? Doesn't a pidfd > > > refer to an actual process that is, or at least was, running? This > > > new thing is a process that we are contemplating spawning. I can > > > imagine that basically all pidfd APIs would be a bit confused by the > > > nonexistence of the process in question. > > > > > > > Yes, I think that is a real concern. > > > > In my current local WIP I tried to keep that distinction explicit. > > pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd > > referring to a process. The builder fd is allocated as an anonymous pidfs > > file with builder-specific file operations: > > > > file = pidfs_alloc_anon_file("[pidfd_spawn]", > > &pidfd_spawn_builder_fops, builder, > > O_RDWR); > > > > What does your builder fd point to, explicitly? For example in my other > reply I > talked about how it was "real" process state. In my FreeBSD patch, for > example, > I found there was already a status for a process "in exec", and I figured > that > was clean to reuse for one of these "embryonic" processes that also hadn't > started running. I would reckon that Linux probably has some similar notions. > > > and the normal pidfd helpers still reject it because it does not use the > > ordinary pidfd file operations: > > > > struct pid *pidfd_pid(const struct file *file) > > { > > if (file->f_op != &pidfs_file_operations) > > return ERR_PTR(-EBADF); > > return file_inode(file)->i_private; > > } > > > > So the current split is: > > > > builder_fd = pidfd_spawn_open(...); /* builder object */ > > pidfd_config(builder_fd, ...); > > child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */ > > > > Only the last fd is a normal pidfd for an actual child process. The builder > > fd is only accepted by the builder operations. > > > > This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(), > > pidfd_getfd(), poll(), etc. mean before the process exists. > > I wouldn't be so sure this is necessary/good. For example, I think it could > make sense to wait on a process that has yet to be started; one just waits > for > both the process to start and the process to exit. Obviously a blocking > syscall > in the thread that is spawning the process is not useful, but the > asynchronous > poll variation seems fine. > > As long as there is real process state here, it shouldn't be too hard to > implement. > > > The downside is that it adds a separate open-style entry point and is less > > uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched. > > I do think there is no point having two file descriptors. The file descriptor > that previously referred to the builder/embryonic process then can refer to > the > real process, right? > > > If people think there is a better way to represent the pre-spawn builder > > state, or if the preference is to integrate it directly into pidfd_open() > > with an explicit empty/future-pidfd state, I would be happy to discuss > > that. > > Hope the above answers your question? I suppose my ideas lean more on the > "future" than "empty" side --- there is indeed a thread in the thread group, > with real VM/namespace/file descriptor etc. state. Moreover, state gets > initialized before the process is started, so the actual start is a pretty > lightweight step of just letting the scheduler know the now-ready process can > be scheduled. The only thing that distinguishes the embryonic process from a > real one is simply that it isn't running --- i.e. isn't (yet) available to be > scheduled --- so the pidfds holders are free to poke at its state. > > Cheers, > > John >
Thanks, this helped a lot. I looked at FreeBSD/OpenBSD/XNU after your note. FreeBSD has P_INEXEC, OpenBSD has PS_INEXEC, and XNU seems even closer with P_LINTRANSIT, described as "process in exec or in creation". Linux does not seem to have a single equivalent today: current->in_execve is only an LSM hint, while the real synchronization is spread across exec_update_lock, cred_guard_mutex, and the exec path. I am switching my local WIP from the two-fd builder model to one fd, closer to Christian's sketch: fd = pidfd_open(0, PIDFD_EMPTY); pidfd_config(fd, ...); pidfd_spawn_run(fd, ...); In my current local version, I still use copy_process(), so the fd points at a real task_struct/pid that is not woken until run. Following Christian's point that existing APIs can handle this not-yet-running case with ESRCH, I currently make ordinary pidfd operations that need a real started process return -ESRCH before start. I am not sure yet whether Linux should grow a general exec/creation transition state like that, or whether a narrower future-process lifecycle is enough for this API. I will think more about that when working on the pristine process version. Regards, Li

