On Fri, Feb 27, 2026 at 4:04 AM Oleg Nesterov <[email protected]> wrote: > > Currently we allow only one attempt to create init in a new namespace. > If the first fork() fails after alloc_pid() succeeds, free_pid() clears > PIDNS_ADDING and thus disables further PID allocations. > > Nowadays this looks like an unnecessary limitation. The original reason > to handle "case PIDNS_ADDING" in free_pid() is gone, most probably after > commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of > proc"). > > Change free_pid() to keep ns->pid_allocated == PIDNS_ADDING, and change > alloc_pid() to reset the cursor early, right after taking pidmap_lock. > > Test-case: > > #define _GNU_SOURCE > #include <linux/sched.h> > #include <sys/syscall.h> > #include <sys/wait.h> > #include <assert.h> > #include <sched.h> > #include <errno.h> > > int main(void) > { > struct clone_args args = { > .exit_signal = SIGCHLD, > .flags = CLONE_PIDFD, > .pidfd = 0, > }; > unsigned long pidfd; > int pid; > > assert(unshare(CLONE_NEWPID) == 0); > > pid = syscall(__NR_clone3, &args, sizeof(args)); > assert(pid == -1 && errno == EFAULT); > > args.pidfd = (unsigned long)&pidfd; > pid = syscall(__NR_clone3, &args, sizeof(args)); > if (pid) > assert(pid > 0 && wait(NULL) == pid); > else > assert(getpid() == 1); > > return 0; > } > > Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Andrei Vagin <[email protected]> Thanks, Andrei

