On Mon, Apr 27, 2026 at 11:33:32AM +0100, Pádraig Brady wrote:
> > Question #1: Does it make sense to add the CDmv patch to split(1),
> > or is this becoming too specific to the Git use case?
> 
> Yes it's probably best to not include CDmv in split, but it would be worth
> documenting in info as a possibility.

I've added a few paragraphs on that to the patch stack.

I've rebased & updated the patch stack at GitHub:
https://github.com/darkk/coreutils/compare/master...darkk:coreutils:cdc

> Copyright assignment has stalled, but will hopefully resume

Thanks!  So, the copyright assignment dance is over and IIUC we're back
to the review track.


I have four open questions:


Question #PCG: I've embedded PCG RNG into make-buz-table.c to have
a nothing-up-my-sleeve S-box for GearHash and BUZHash.  It is okay from
licensing (and others) point of view? PCG is licensed under Apache 2.0.

I think it is okay, but I might be missing something.


Question #randperm: tests/split/random-source.sh test with ffff-1M input
shows an issue in randperm_new() on random-source being stream of 0xFF.
On one hand, that's a defect in randint_choose() and randpem_bound(),
on the other hand it's user shooting themselves in a foot.
Should something be done about that or not?

I think, it's okay to leave that as-is. Broken input leading to broken
process is okay. Non-uniform random-source may break things.
I ask the question as I want to get an second opinion explicitly.


Question #API: does Gnulib randperm_*() offer cross-version stability
guarantee? Does it promise that given the same bytes it'll produce the
same permutations across Gnulib versions?

BUZHash has a requirement for its S-Box : every Ith bit of the S-box
should be a permutation of {128 zeros, 128 ones}, so it can't take
entropy from random-source as-is.  This feature is not enough
for a S-box to be "good", but that's what BUZHash definition demands.

What should be done if Gnulib does not offer such guarantee?
I see three options:

- nothing, no guarantees in Gnulib lead to no guarantees in split
- custom "vendored" permutation code
- compatibility hack

The code I've implemented so far does nothing as I failed to find
anything about API stability guarantees for Gnulib RNGs, but I might
be looking in a wrong place.  That assumption leads to this question :-)

Compatibility hack might be split checking if --random-source is a valid
BUZHash S-box or just a stream of random bytes. Valid S-box might be
used as-is, random bytes might be fed to randperm_*(). The chance to get
a valid BUZhash S-box out random stream is negligible: 2e-42 and 4e-84
for 32-bit and 64-bit S-boxes. So, confusing these two is unlikely.

The compatibility hack will allow user to have an option to feed
an "old" S-box to BUZHash at the very least in a case of unlikely
randperm_* changes.

What do you think, does it make sense to buy some future-proofing
for a tiny bit of extra complexity?


Question #cdcmv: I mention content-defined chunk (re)naming (CDCmv)
in coreutils.texi. Should a sample script implementing CDCmv
be a part of GNU Coreutils distribution or not?

I'm ambivalent. On one hand, I'm very happy just havin fast C file
slicer. On another hand, it feels "unfair" to leave potential users
with just an algorithm and without a implementation example.
One also might be absolutely right saying that I'm just imagining
those users and I'm the only one in this room so far.

GNU Coreutils has no contrib/ section, while Git does.  Probably,
the best way to go is to submit cdcmv to Git. Does it sound right to you?


-- 
WBRBW, Leonid Evdokimov, https://darkk.net.ru tel:+79816800702
PGP: 6691 DE6B 4CCD C1C1 76A0  0D4A E1F2 A980 7F50 FAB2

Reply via email to