On 24/10/2025 07:24, Collin Funk wrote:
Here is a patch to make 'sort' use posix_spawnp.
Very tiny performance improvement:
$ export LC_ALL=C
$ seq 100000 | shuf > input
$ perf stat -i --repeat 10 ./src/sort-prev -S 1K \
--compress-program=gzip input 2>&1 > /dev/null \
| grep -F 'seconds time elapsed'
4.5896 +- 0.0109 seconds time elapsed ( +- 0.24% )
$ perf stat -i --repeat 10 ./src/sort -S 1K \
--compress-program=gzip input 2>&1 > /dev/null \
| grep -F 'seconds time elapsed'
4.53118 +- 0.00142 seconds time elapsed ( +- 0.03% )
But hopefully it avoids a fork failing with ENOMEM.
Also, I think the error reporting is a bit nicer with this change. Here
is how it behaved previously:
$ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
couldn't execute compress program: errno 2
[...]
couldn't execute compress program: errno 2
sort: 'missing' [-d] terminated abnormally
couldn't execute compress program: errno 2
couldn't execute compress program: errno 2
$ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
couldn't execute compress program: errno 2
[...]
couldn't execute compress program: errno 2
sort: 'missing' [-d] terminated abnormally
couldn't execute compress program: errno 2
Here it is now:
$ ./src/sort -S 1K --compress-program=missing Makefile 2>&1
sort: couldn't create process for 'missing': No such file or directory
That's quite a bit better.
The logic in the patch looks sound.
I tried to make it more succinct but it looks optimal in that regard.
A few remarks on the commit message.
Generally the summary should describe the "why" not the "what".
So: "sort: prefer posix_spawnp to fork and execlp" would be better
as: "sort: use the more efficient posix_spawn to invoke helpers"
Also "* NEWS: Mention the change." would be better
as: "* NEWS: Mention the improvement."
I've attached a few naming tweaks your patch
and adjusted an error message slightly,
which you can squash in if you agree.
thanks!
Padraig
From 27a87f4112c5eec9781768faad06ac92f18c38b7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Fri, 24 Oct 2025 13:11:08 +0100
Subject: [PATCH] maint: tweak sort posix_spawn patch
* NEWS: Mention the memory element which is significant for sort.
* src/sort.c: Use a more accurate error message when failing
to run the --compress-program (along the lines of Bernhard's recent
improvement to install(1). Remove all mentions of "fork".
---
NEWS | 4 ++--
src/sort.c | 26 +++++++++++++-------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/NEWS b/NEWS
index aac54ae92..8524ac4f7 100644
--- a/NEWS
+++ b/NEWS
@@ -54,8 +54,8 @@ GNU coreutils NEWS -*- outline -*-
- supports a multi-byte --delimiter character
- no longer processes input indefinitely in the presence of write errors
- 'sort' now uses posix_spawn() to invoke the command specified by
- --compress-program more efficiently.
+ 'sort' now uses posix_spawn() to invoke --compress-program more efficiently
+ and more independently from sort's memory usage.
'split' now uses posix_spawn() to invoke the shell command specified by
--filter more efficiently.
diff --git a/src/sort.c b/src/sort.c
index ce3a05167..c8df900f6 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -130,16 +130,16 @@ enum
enum
{
- /* The number of times we should try to fork a compression process
+ /* The number of times we should try to spawn a compression process
(we retry if the posix_spawnp call fails with EAGAIN). We don't _need_
to compress temp files, this is just to reduce file system access, so
this number can be small. Each retry doubles in duration. */
- MAX_FORK_TRIES_COMPRESS = 4,
+ MAX_TRIES_COMPRESS = 4,
- /* The number of times we should try to fork a decompression process.
- If we can't fork a decompression process, we can't sort, so this
+ /* The number of times we should try to spawn a decompression process.
+ If we can't spawn a decompression process, we can't sort, so this
number should be big. Each retry doubles in duration. */
- MAX_FORK_TRIES_DECOMPRESS = 9
+ MAX_TRIES_DECOMPRESS = 9
};
enum
@@ -1031,8 +1031,8 @@ posix_spawn_file_actions_move_fd (posix_spawn_file_actions_t *actions,
otherwise. */
static int
-pipe_fork (pid_t *pid, int pipefds[2], int tempfd, bool decompress,
- size_t tries)
+pipe_child (pid_t *pid, int pipefds[2], int tempfd, bool decompress,
+ size_t tries)
{
struct tempnode *saved_temphead;
double wait_retry = 0.25;
@@ -1154,11 +1154,11 @@ maybe_create_temp (FILE **pfp, bool survive_fd_exhaustion)
if (compress_program)
{
int pipefds[2];
- int result = pipe_fork (&node->pid, pipefds, tempfd, false,
- MAX_FORK_TRIES_COMPRESS);
+ int result = pipe_child (&node->pid, pipefds, tempfd, false,
+ MAX_TRIES_COMPRESS);
if (result)
- error (SORT_FAILURE, result, _("couldn't create process for %s"),
+ error (SORT_FAILURE, result, _("could not run compress program %s"),
quoteaf (compress_program));
else
{
@@ -1206,13 +1206,13 @@ open_temp (struct tempnode *temp)
return nullptr;
pid_t child;
- int result = pipe_fork (&child, pipefds, tempfd, true,
- MAX_FORK_TRIES_DECOMPRESS);
+ int result = pipe_child (&child, pipefds, tempfd, true,
+ MAX_TRIES_DECOMPRESS);
if (result)
{
if (result != EMFILE)
- error (SORT_FAILURE, result, _("couldn't create process for %s -d"),
+ error (SORT_FAILURE, result, _("could not run compress program %s -d"),
quoteaf (compress_program));
close (tempfd);
errno = EMFILE;
--
2.51.0