On 24/10/2025 13:33, Pádraig Brady wrote:
On 24/10/2025 07:24, Collin Funk wrote:
Here is a patch to make 'sort' use posix_spawnp.
Very tiny performance improvement:
$ export LC_ALL=C
$ seq 100000 | shuf > input
$ perf stat -i --repeat 10 ./src/sort-prev -S 1K \
--compress-program=gzip input 2>&1 > /dev/null \
| grep -F 'seconds time elapsed'
4.5896 +- 0.0109 seconds time elapsed ( +- 0.24% )
$ perf stat -i --repeat 10 ./src/sort -S 1K \
--compress-program=gzip input 2>&1 > /dev/null \
| grep -F 'seconds time elapsed'
4.53118 +- 0.00142 seconds time elapsed ( +- 0.03% )
But hopefully it avoids a fork failing with ENOMEM.
Also, I think the error reporting is a bit nicer with this change. Here
is how it behaved previously:
$ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
couldn't execute compress program: errno 2
[...]
couldn't execute compress program: errno 2
sort: 'missing' [-d] terminated abnormally
couldn't execute compress program: errno 2
couldn't execute compress program: errno 2
$ /bin/sort -S 1K --compress-program=missing Makefile 2>&1
couldn't execute compress program: errno 2
[...]
couldn't execute compress program: errno 2
sort: 'missing' [-d] terminated abnormally
couldn't execute compress program: errno 2
Here it is now:
$ ./src/sort -S 1K --compress-program=missing Makefile 2>&1
sort: couldn't create process for 'missing': No such file or directory
That's quite a bit better.
The logic in the patch looks sound.
Actually the fall-back logic in maybe_create_temp() looks wrong.Previously any
fork failure (usually due to resource contraints)
would result in a fallback to writing to temp file directly.
Now any posix_spawn() failure results in immediate exit.
Should we treat the --compress-program as fully optional
(which it was anyway in the case of resource contraints)
and always fall-back to writing the temp file directly?
(We'd write an error for each spawn failure of course).
I'm inclined to think we should do that.
cheers,
Padraig