On 02/03/2026 12:32, Leonid Evdokimov wrote:
Hello,
Here is the patch to implement CDC in `split --bytes`. I'm submitting
it for review before proceeding with adding CDC to --line-bytes.
I've tested the patch on x86_64, ppc64be and Apple M1, with gcc and clang.
Texinfo documentation is currently missing.
I'm mostly unsure about the following:
1) autoconf and l10n logic being right, as I'm not familiar with AC/AM
and gettext.
2) embedding PCG RNG into make-buz-table. Is there a better way to
accomplish the goal and is it a better way needed?
3) licensing/authorship headers. There might be guidelines I'm missing.
4) right place for getcachelinesize(). Should it be a separate file
and/or part of gnulib?
5) busy-loop of randperm_new() on random-source being stream of 0xFF.
On one hand, that's a "bug" in randint_choose() and randpem_bound(),
on the other hand - one may say that it's just a foot-shooting case.
6) moving `+1` byte allocation to be specific for lines_split(). I've
not run asan build to test correctness. +1 is there for 35 years and,
seems, lines_split() is the only user of that extra byte, but my eye
might miss something.
7) 40 MiB limit for 32-bit CDC hashes, it's tempting to say "42 MB".
Should we? :-)
I've tried to add enough comments to make the code easy to understand,
but I can add more if that's helpful as the memory is still fresh.
The patch patch is also available at github:
https://github.com/coreutils/coreutils/compare/master...darkk:coreutils:cdc
Following on from ...
https://lists.gnu.org/archive/html/coreutils/2025-01/msg00028.html
https://lists.gnu.org/archive/html/coreutils/2026-02/msg00106.html
Some comments from cursory glance:
The INTEL_JCC_ERRATUM stuff may be more generally applicable,
and more appropriate for a separate patch.
same_bytes_() should use `openssl version || skip_`
at least for documentation reasons.
It's better to use 'cksum -a bs2um' than the deprecated `b2sum` in tests.
Was the issue with errnos in getlimits, too noisy logs?
We could disable tracing in getlimits_ if that was the case.
It would be good to augment the "invalid rolling hash window"
error with a valid range for the selected hash.
Oh right I see there are other validations later on.
Anyway tt would be good to augment this initial error if possible.
You will also need to assign copyright for a change of this size.
The process is described undef "Copyright Assignment" in the HACKING file.
thanks for working on this!
Padraig