On 24/10/2025 07:24, Collin Funk wrote:
Here is a patch to make 'sort' use posix_spawnp.

Very tiny performance improvement:

     $ export LC_ALL=C
     $ seq 100000 | shuf > input
     $ perf stat -i --repeat 10 ./src/sort-prev -S 1K \
         --compress-program=gzip input 2>&1 > /dev/null \
         | grep -F 'seconds time elapsed'
                 4.5896 +- 0.0109 seconds time elapsed  ( +-  0.24% )
     $ perf stat -i --repeat 10 ./src/sort -S 1K \
         --compress-program=gzip input 2>&1 > /dev/null \
         | grep -F 'seconds time elapsed'
                4.53118 +- 0.00142 seconds time elapsed  ( +-  0.03% )

But hopefully it avoids a fork failing with ENOMEM.

Also, I think the error reporting is a bit nicer with this change. Here
is how it behaved previously:

     $ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
     couldn't execute compress program: errno 2
     [...]
     couldn't execute compress program: errno 2
     sort: 'missing' [-d] terminated abnormally
     couldn't execute compress program: errno 2
     couldn't execute compress program: errno 2
     $ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
     couldn't execute compress program: errno 2
     [...]
     couldn't execute compress program: errno 2
     sort: 'missing' [-d] terminated abnormally
     couldn't execute compress program: errno 2

Here it is now:

     $ ./src/sort -S 1K --compress-program=missing Makefile 2>&1
     sort: couldn't create process for 'missing': No such file or directory

That's quite a bit better.

The logic in the patch looks sound.
I tried to make it more succinct but it looks optimal in that regard.

A few remarks on the commit message.

Generally the summary should describe the "why" not the "what".
So: "sort: prefer posix_spawnp to fork and execlp" would be better
as: "sort: use the more efficient posix_spawn to invoke helpers"

Also "* NEWS: Mention the change." would be better
as:  "* NEWS: Mention the improvement."

I've attached a few naming tweaks your patch
and adjusted an error message slightly,
which you can squash in if you agree.

thanks!
Padraig
From 27a87f4112c5eec9781768faad06ac92f18c38b7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Fri, 24 Oct 2025 13:11:08 +0100
Subject: [PATCH] maint: tweak sort posix_spawn patch

* NEWS: Mention the memory element which is significant for sort.
* src/sort.c: Use a more accurate error message when failing
to run the --compress-program (along the lines of Bernhard's recent
improvement to install(1).  Remove all mentions of "fork".
---
 NEWS       |  4 ++--
 src/sort.c | 26 +++++++++++++-------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/NEWS b/NEWS
index aac54ae92..8524ac4f7 100644
--- a/NEWS
+++ b/NEWS
@@ -54,8 +54,8 @@ GNU coreutils NEWS                                    -*- outline -*-
    - supports a multi-byte --delimiter character
    - no longer processes input indefinitely in the presence of write errors
 
-  'sort' now uses posix_spawn() to invoke the command specified by
-  --compress-program more efficiently.
+  'sort' now uses posix_spawn() to invoke --compress-program more efficiently
+  and more independently from sort's memory usage.
 
   'split' now uses posix_spawn() to invoke the shell command specified by
   --filter more efficiently.
diff --git a/src/sort.c b/src/sort.c
index ce3a05167..c8df900f6 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -130,16 +130,16 @@ enum
 
 enum
   {
-    /* The number of times we should try to fork a compression process
+    /* The number of times we should try to spawn a compression process
        (we retry if the posix_spawnp call fails with EAGAIN).  We don't _need_
        to compress temp files, this is just to reduce file system access, so
        this number can be small.  Each retry doubles in duration.  */
-    MAX_FORK_TRIES_COMPRESS = 4,
+    MAX_TRIES_COMPRESS = 4,
 
-    /* The number of times we should try to fork a decompression process.
-       If we can't fork a decompression process, we can't sort, so this
+    /* The number of times we should try to spawn a decompression process.
+       If we can't spawn a decompression process, we can't sort, so this
        number should be big.  Each retry doubles in duration.  */
-    MAX_FORK_TRIES_DECOMPRESS = 9
+    MAX_TRIES_DECOMPRESS = 9
   };
 
 enum
@@ -1031,8 +1031,8 @@ posix_spawn_file_actions_move_fd (posix_spawn_file_actions_t *actions,
    otherwise.  */
 
 static int
-pipe_fork (pid_t *pid, int pipefds[2], int tempfd, bool decompress,
-           size_t tries)
+pipe_child (pid_t *pid, int pipefds[2], int tempfd, bool decompress,
+            size_t tries)
 {
   struct tempnode *saved_temphead;
   double wait_retry = 0.25;
@@ -1154,11 +1154,11 @@ maybe_create_temp (FILE **pfp, bool survive_fd_exhaustion)
   if (compress_program)
     {
       int pipefds[2];
-      int result = pipe_fork (&node->pid, pipefds, tempfd, false,
-                              MAX_FORK_TRIES_COMPRESS);
+      int result = pipe_child (&node->pid, pipefds, tempfd, false,
+                               MAX_TRIES_COMPRESS);
 
       if (result)
-        error (SORT_FAILURE, result, _("couldn't create process for %s"),
+        error (SORT_FAILURE, result, _("could not run compress program %s"),
                quoteaf (compress_program));
       else
         {
@@ -1206,13 +1206,13 @@ open_temp (struct tempnode *temp)
     return nullptr;
 
   pid_t child;
-  int result = pipe_fork (&child, pipefds, tempfd, true,
-                          MAX_FORK_TRIES_DECOMPRESS);
+  int result = pipe_child (&child, pipefds, tempfd, true,
+                           MAX_TRIES_DECOMPRESS);
 
   if (result)
     {
       if (result != EMFILE)
-        error (SORT_FAILURE, result, _("couldn't create process for %s -d"),
+        error (SORT_FAILURE, result, _("could not run compress program %s -d"),
                quoteaf (compress_program));
       close (tempfd);
       errno = EMFILE;
-- 
2.51.0

Reply via email to