On Mon, Apr 27, 2026 at 11:33:32AM +0100, Pádraig Brady wrote: > > Question #1: Does it make sense to add the CDmv patch to split(1), > > or is this becoming too specific to the Git use case? > > Yes it's probably best to not include CDmv in split, but it would be worth > documenting in info as a possibility.
I've added a few paragraphs on that to the patch stack. I've rebased & updated the patch stack at GitHub: https://github.com/darkk/coreutils/compare/master...darkk:coreutils:cdc > Copyright assignment has stalled, but will hopefully resume Thanks! So, the copyright assignment dance is over and IIUC we're back to the review track. I have four open questions: Question #PCG: I've embedded PCG RNG into make-buz-table.c to have a nothing-up-my-sleeve S-box for GearHash and BUZHash. It is okay from licensing (and others) point of view? PCG is licensed under Apache 2.0. I think it is okay, but I might be missing something. Question #randperm: tests/split/random-source.sh test with ffff-1M input shows an issue in randperm_new() on random-source being stream of 0xFF. On one hand, that's a defect in randint_choose() and randpem_bound(), on the other hand it's user shooting themselves in a foot. Should something be done about that or not? I think, it's okay to leave that as-is. Broken input leading to broken process is okay. Non-uniform random-source may break things. I ask the question as I want to get an second opinion explicitly. Question #API: does Gnulib randperm_*() offer cross-version stability guarantee? Does it promise that given the same bytes it'll produce the same permutations across Gnulib versions? BUZHash has a requirement for its S-Box : every Ith bit of the S-box should be a permutation of {128 zeros, 128 ones}, so it can't take entropy from random-source as-is. This feature is not enough for a S-box to be "good", but that's what BUZHash definition demands. What should be done if Gnulib does not offer such guarantee? I see three options: - nothing, no guarantees in Gnulib lead to no guarantees in split - custom "vendored" permutation code - compatibility hack The code I've implemented so far does nothing as I failed to find anything about API stability guarantees for Gnulib RNGs, but I might be looking in a wrong place. That assumption leads to this question :-) Compatibility hack might be split checking if --random-source is a valid BUZHash S-box or just a stream of random bytes. Valid S-box might be used as-is, random bytes might be fed to randperm_*(). The chance to get a valid BUZhash S-box out random stream is negligible: 2e-42 and 4e-84 for 32-bit and 64-bit S-boxes. So, confusing these two is unlikely. The compatibility hack will allow user to have an option to feed an "old" S-box to BUZHash at the very least in a case of unlikely randperm_* changes. What do you think, does it make sense to buy some future-proofing for a tiny bit of extra complexity? Question #cdcmv: I mention content-defined chunk (re)naming (CDCmv) in coreutils.texi. Should a sample script implementing CDCmv be a part of GNU Coreutils distribution or not? I'm ambivalent. On one hand, I'm very happy just havin fast C file slicer. On another hand, it feels "unfair" to leave potential users with just an algorithm and without a implementation example. One also might be absolutely right saying that I'm just imagining those users and I'm the only one in this room so far. GNU Coreutils has no contrib/ section, while Git does. Probably, the best way to go is to submit cdcmv to Git. Does it sound right to you? -- WBRBW, Leonid Evdokimov, https://darkk.net.ru tel:+79816800702 PGP: 6691 DE6B 4CCD C1C1 76A0 0D4A E1F2 A980 7F50 FAB2
