On 10/25/25 15:33, Collin Funk wrote:
Pádraig Brady <[email protected]> writes:

Therefore, I think a behavior change is unavoidable if we want to
use
posix_spawn. Falling back to no compression on temporary files seems
reasonable to me.

It seems better to me TBH.
One could be running a very long sort and re-installing
compressors underneath or something, in which case
you'd want the sort to be immune and keep going.

The attached patch is the same as before, plus your changes. Ontop
of
that I added another NEWS entry, documentation, and a test case for if
--compress-program cannot be executed.

It looks great.
I've one more suggestion attached to track the last_result
in all cases, so that periodic issues were diagnosed.
For e.g. if the system went into resource constraints occasionally,
it would be better to diagnose each time that happened.

Thanks, I pushed it with that change [1].

I notice issues with valgrind which are not blocking but interesting...

where it's waiting for longer when interacting with compress program:

   $ time valgrind /bin/sort -S 1K --compress-program=gzip Makefile | wc -l
   real  0m28.762s
   user  0m55.544s
   sys   0m44.364s

   $ time valgrind src/sort -S 1K --compress-program=gzip Makefile | wc -l
   real  1m6.639s
   user  0m45.288s
   sys   0m35.937s

and also it fails completely in the --compress-program=missing case:

   $ valgrind src/sort -S 1K --compress-program=missing Makefile | wc -l
    Process terminating with default action of signal 13 (SIGPIPE)
       at 0x48E377E: __internal_syscall_cancel (cancellation.c:64)
       by 0x48E37A3: __syscall_cancel (cancellation.c:75)
       by 0x495E2BD: write (write.c:26)
       by 0x48DF1C4: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1241)
       by 0x48DD015: new_do_write (fileops.c:483)
       by 0x48DE140: _IO_do_write@@GLIBC_2.2.5 (fileops.c:460)
       by 0x48DD84F: _IO_file_close_it@@GLIBC_2.2.5 (fileops.c:136)
       by 0x48D09C4: fclose@@GLIBC_2.2.5 (iofclose.c:53)
       by 0x4046E4: xfclose (sort.c:992)
       by 0x4029D0: sort (sort.c:4151)
       by 0x4029D0: main (sort.c:4919)

Maybe valgrind's remapping of clone3 -> clone is triggering that?
https://bugs.kde.org/show_bug.cgi?id=420906
I might get time to look into it.

The SIGPIPE makes me think that posix_spawn is returning 0 as if it
suceeded, which is incorrect because 'missing' cannot be executed.

It's easy to get vfork()/exec() wrong in that case, but musl-libc does it right: Have the parent create a CLOEXEC pipe() pair before the vfork(), then have the child send a failure indicator through the pipe after the exec fails. Otherwise the exec closes the pipe without sending any data, which the parent treats as success.

This is a libc issue, not a caller issue.

Rob

Reply via email to