I debugged this further: The issue boils down to several things that happen rarely: - source and destination must be on different mountpoints, so FICLONE fails - the fallback copy_file_range usually copies at most 2GB segments on ZFS, however it seems to be able to copy more at once when copying from a snapshot.
The problem now is that the return value is interpreted as a negative number. It's not clear to me how that happens, as ssize_t should be a signed 64-bit number and contain the value fine, however, gdb also agrees: Breakpoint 1, copy_file_range (infd=infd@entry=3, pinoff=pinoff@entry=0x0, outfd=outfd@entry=4, poutoff=poutoff@entry=0x0, length=137304735744, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27 27 { (gdb) fin Run till exit from #0 copy_file_range (infd=infd@entry=3, pinoff=pinoff@entry=0x0, outfd=outfd@entry=4, poutoff=poutoff@entry=0x0, length=137304735744, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27 sparse_copy (src_fd=src_fd@entry=3, dest_fd=dest_fd@entry=4, abuf=abuf@entry=0x7fffffffd9d8, buf_size=buf_size@entry=262144, hole_size=0, punch_holes=punch_holes@entry=true, allow_reflink=true, src_name=0x7fffffffe3d7 "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", dst_name=0x7fffffffe414 "celestis.img", max_n_read=137304735744, total_n_read=0x7fffffffd9e0, last_write_made_hole=0x7fffffffd9d0) at src/copy.c:344 344 if (n_copied == 0) Value returned is $2 = -134217728 Then the error branch is triggered and the code falsely reads errno (which is 18 from the failed FICLONE) so is_CLONENOTSUP is true, we leave the loop without error reporting, total_n_read is still 0, etc... and it ends up truncating the file thinking the file has shrunk. Unfortunate. I think the return value gets corrupted in glibc, see: https://github.com/bminor/glibc/blob/d9a348d0927c7a1aec5caf3df3fcd36956b3eb23/nptl/cancellation.c#L66 long int __syscall_cancel (__syscall_arg_t a1, __syscall_arg_t a2, __syscall_arg_t a3, __syscall_arg_t a4, __syscall_arg_t a5, __syscall_arg_t a6, __SYSCALL_CANCEL7_ARG_DEF __syscall_arg_t nr) { int r = __internal_syscall_cancel (a1, a2, a3, a4, a5, a6, __SYSCALL_CANCEL7_ARG nr); return __glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (r)) ? SYSCALL_ERROR_LABEL (INTERNAL_SYSCALL_ERRNO (r)) : r; } Here, r should be a long int. As a workaround, copy_max could be clamped to 2GB. P.S.: why does coreutils cat not fail as well? It checks the return value against -1, which it is not... -- Leah Neukirchen <l...@vuxu.org> https://leahneukirchen.org/