On Fri, Jan 06, 2023 at 11:55:16AM +0800, Paul Wise wrote:
> The new io_uring_spawn mechanism for spawning processes without forking
> should be more efficient than fork+exec, especially when starting small
> processes from large processes. Also posix_spawn and vfork+exec exist.
> 
> https://lwn.net/Articles/908268/
> 
> I think the order of preference for spawning processes should be:
> 
>  * io_uring_spawn: this is Linux-only and only in new versions. Prefer
>    this over posix_spawn in case of an old glibc and new Linux kernel.

What I've heard of this sounds good, but as far as I can tell this is
not in upstream Linux, there are no patches anywhere to be found, no API
documentation, and the only references to the feature I can find
anywhere in a web search are all references to this one presentation
with no further detail.  It's of course possible that I've missed
something, but from what I can see it's far too early to even be able to
decide whether this would be usable, never mind being able to make use
of it on real systems.

>  * posix_spawn: this uses the appropriate mechanisms on each platform,
>    glibc might be changing this to use io_uring_spawn where possible.

I can see a few limitations here:

 * The standard API offers no way to set the working directory of the
   child process, which would be needed for pipecmd_chdir and
   pipecmd_fchdir.  However, glibc 2.29 added
   posix_spawn_file_actions_addchdir_np and
   posix_spawn_file_actions_addfchdir_np as GNU extensions.

 * I'm not totally sure how to translate pipecmd_nice into
   posix_spawn-speak; the documentation is, uh, opaque.  It's probably
   possible.

 * This wouldn't be usable for pipeline commands created using
   pipecmd_new_sequence, as posix_spawn isn't guaranteed to be
   async-signal-safe so can't be called between fork and exec, unlike
   fork.

However, we could always just restrict the conditions under which
posix_spawn is used, much as GLib's g_spawn_* functions do.  None of the
above features are used on mandb's hot path, for instance.

>  * vfork+exec: this is similar to what glibc does for posix_spawn.

If somebody were to present me with a patch for this then I suppose I
might at least consider it (though with a healthy amount of
scepticism!); but it's difficult, and I'm not sure I have the necessary
skills to review it properly.  glibc's posix_spawn implementation has
this moderately fearsome comment at the top:

/* The Linux implementation of posix_spawn{p} uses the clone syscall directly
   with CLONE_VM and CLONE_VFORK flags and an allocated stack.  The new stack
   and start function solves most the vfork limitation (possible parent
   clobber due stack spilling). The remaining issue are:

   1. That no signal handlers must run in child context, to avoid corrupting
      parent's state.
   2. The parent must ensure child's stack freeing.
   3. Child must synchronize with parent to enforce 2. and to possible
      return execv issues.

   The first issue is solved by blocking all signals in child, even
   the NPTL-internal ones (SIGCANCEL and SIGSETXID).  The second and
   third issue is done by a stack allocation in parent, and by using a
   field in struct spawn_args where the child can write an error
   code. CLONE_VFORK ensures that the parent does not run until the
   child has either exec'ed successfully or exited.  */

Do I really want that complexity in libpipeline?  I'm not sure that I
do.  It's certainly not close to being a drop-in replacement for fork.
posix_spawn, maybe with GNU extensions, looks like a more appealing
option.

-- 
Colin Watson (he/him)                              [cjwat...@debian.org]

Reply via email to