Pádraig Brady <[email protected]> writes:

> On 06/04/2026 05:52, Collin Funk wrote:
>> Pádraig Brady <[email protected]> writes:
>> 
>>> On 04/04/2026 03:27, Collin Funk wrote:
>>>> Attatched a v4 patch.
>>>
>>> Since cat is such a fundamental tool, I think it's worth being extra 
>>> defensive
>>> by trying the read()/write() loop after the splice.
>>> I've heard of cases where some fuse file systems lie about the size in stat,
>>> while read() provides all the data.
>>>
>>> Also it seems a bit wasteful to allocate large pipes in kernel mem,
>>> for catting small files.
>>>
>>> I think we could handle both these cases with the attached adjustment,
>>> which avoids splice for small regular files, and tries read()/write()
>>> after splice() always.  Note the size threshold was determined with:
>>>
>>>    $ truncate -s $((32*1024)) mid  $ hyperfine -N 'src/cat-splice mid' 
>>> 'src/cat mid'
>>>
>>> Since we're not doing any more stats() to determine this info,
>>> this should be more memory efficient and safer.
>> This seems reasonable.
>> However, there is an issue with your patch. Falling back to read()
>> means
>> that the user will have to send two EOFs to exit the program. One for
>> the splice() call and one for the read() call. I forgot about this but
>> noticed it when tests/tty/tty-eof.pl failed after running 'make check'.
>> How about this patch which disables the use of splice() when the
>> input
>> file descriptor is associated with a terminal? It feels a bit safer than
>> trusting the EOF from splice().
>
> Oh right, EOF from tty is not persistent.
> I suppose any device could behave like that theoretically.
> I.e., the unconditional fallback read() is not appropriate.
> Sorry for the misdirection there.
>
> I'm thinking the attached adjustment is best now.
> I.e., keep the small size exclusion for splice(),
> but don't do the fallback read() after splice finishes.
>
> I also tweaked NEWS the attached as it was a bit inaccurate wrt pipes.
>
> If you agree with the above feel free to push.

All looks good to me. I pushed it with those changes [1].

> +              int splice_cat_status =
> +                usable_st_size (&istat_buf) && istat_buf.st_size <= 32768
> +                ? 0 : splice_cat ();

I adjusted the format of this slightly. I prefer adding parentheses to
show the nesting as mentioned in the GNU Coding Standard [2]. It formats
well automatically in Emacs, but maybe it is awkward in other editors.

> BTW I'm still thinking about yes -z to efficiently generate zeros,
> as the splicing from /dev/zero isn't that fast in comparison:
>
>   $ timeout 2 src/cat /dev/zero | pv -r > /dev/null
>   [5.50GiB/s]
>
>   $ timeout 2 src/yes | pv -r > /dev/null
>   [31.1GiB/s]

What do you think of the more general --separator (-s) option?

I was thinking you could get the behavior of 'yes -z' by using:

    $ yes --separator '' ''

But it would also allow you to use strings instead of a newline
character.

Collin

[1] 
https://github.com/coreutils/coreutils/commit/457f88513a128ce91160c4a60f821cc1612204be
[2] https://www.gnu.org/prep/standards/html_node/Formatting.html

Reply via email to