Pádraig Brady <[email protected]> writes: >> Therefore, I think a behavior change is unavoidable if we want to >> use >> posix_spawn. Falling back to no compression on temporary files seems >> reasonable to me. > > It seems better to me TBH. > One could be running a very long sort and re-installing > compressors underneath or something, in which case > you'd want the sort to be immune and keep going. > >> The attached patch is the same as before, plus your changes. Ontop >> of >> that I added another NEWS entry, documentation, and a test case for if >> --compress-program cannot be executed. > > It looks great. > I've one more suggestion attached to track the last_result > in all cases, so that periodic issues were diagnosed. > For e.g. if the system went into resource constraints occasionally, > it would be better to diagnose each time that happened.
Thanks, I pushed it with that change [1]. > I notice issues with valgrind which are not blocking but interesting... > > where it's waiting for longer when interacting with compress program: > > $ time valgrind /bin/sort -S 1K --compress-program=gzip Makefile | wc -l > real 0m28.762s > user 0m55.544s > sys 0m44.364s > > $ time valgrind src/sort -S 1K --compress-program=gzip Makefile | wc -l > real 1m6.639s > user 0m45.288s > sys 0m35.937s > > and also it fails completely in the --compress-program=missing case: > > $ valgrind src/sort -S 1K --compress-program=missing Makefile | wc -l > Process terminating with default action of signal 13 (SIGPIPE) > at 0x48E377E: __internal_syscall_cancel (cancellation.c:64) > by 0x48E37A3: __syscall_cancel (cancellation.c:75) > by 0x495E2BD: write (write.c:26) > by 0x48DF1C4: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1241) > by 0x48DD015: new_do_write (fileops.c:483) > by 0x48DE140: _IO_do_write@@GLIBC_2.2.5 (fileops.c:460) > by 0x48DD84F: _IO_file_close_it@@GLIBC_2.2.5 (fileops.c:136) > by 0x48D09C4: fclose@@GLIBC_2.2.5 (iofclose.c:53) > by 0x4046E4: xfclose (sort.c:992) > by 0x4029D0: sort (sort.c:4151) > by 0x4029D0: main (sort.c:4919) > > Maybe valgrind's remapping of clone3 -> clone is triggering that? > https://bugs.kde.org/show_bug.cgi?id=420906 > I might get time to look into it. The SIGPIPE makes me think that posix_spawn is returning 0 as if it suceeded, which is incorrect because 'missing' cannot be executed. Here are the clone calls when using --compress-program=gzip: $ strace valgrind src/sort -S 1K --compress-program=gzip \ Makefile 2>&1 | grep -c ^clone 7338 Note that those are all clone and not clone3, as you mentioned. When using --compress-program=missing, valgrind exits right after the clone: $ strace valgrind src/sort -S 1K --compress-program=missing \ Makefile 2>&1 [...] clone(child_stack=NULL, flags=CLONE_VFORK|SIGCHLD==376076== ==376076== HEAP SUMMARY: ==376076== in use at exit: 2,032 bytes in 18 blocks ==376076== total heap usage: 70 allocs, 52 frees, 40,144 bytes allocated [...] I agree it isn't blocking since we can still use ASAN and UBSAN. But we should certainly try to get it fixed. Collin [1] https://github.com/coreutils/coreutils/commit/2b3eb795c85633a4b808df44d19a4a9be5976c10
