Bruno, thanks.

Multiple things

1. I am not sure what "per neighbor" here means. We could take some random
number, advertise that in the hellos and use that as seed. all good except
hellos are unreliable. Also, things like "take neighbors address" are
difficult to do due to unnumbered links and so on. Ultimately such a seed,
unless it varies over time reliably, is just another constant in hashing
and I doubt it will improve any probabilities. But, yes, within a system we
_could_ vary some constant used to build the hashes regularly (kind of like
nonce) and advertise this constant in the HSNP itself. computationally
expensive for the neighbor to compare and even if the numbers change slowly
we need to cache per neighbor the whole thing (since each neighbor uses a
different seed) but yes, it would vastly lower the probability of a "hash
clash" of course.
2. To give the math quickly once again. I started with 64 bits (and we
could do it) but things were big in terms of packet sizes and I realized
with math it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of
that) is 16 bits so assuming uniform distribution and independent variables
we could say that a HSNP checksum over 4E9 (2^32) fragments statistically
has the same "strength" as fletcher over a single fragment. Generally in
1st level HSNP we'll cover about 80 ranges each over max. 60 fragments or
so (if I remember the numbers in the draft correctly) so probability is
roughly 24E2 / 4E9 of that, 2nd level each range can pack a full packet of
L1 ranges so we can say 80 * 80 * 60 fragments which is 38E4 / 4E9. so in
plain numbers we are talking in very coarse terms of 0.0000001 and 0.00001
chance of things clashing. Once we move to layer 3 we are in realm of 30E6
fragments database, not something I consider likely but for the sake of
argument that's 3E7/4E9 and then we're in 0.001 magnitude, i.e. there is a
chance of about 0.1% that two databases containing 30 million fragments end
up producing somewhere the same hash for two fragments. Of course it
matters a _lot_ where this same hash is produced. If it's in the same L1
range then XOR'ing both of them will basically remove both fragments. If
not in the same range it's basically meaningless since range above it will
not see the clash anymore. All gets mathematically hairy to analyze and I
leave it at that. So if with all this people still think that all
necessitates solution 1 with a new random number every couple iterations,
that's doable, we can also go to 24 bit HSNP hashes easily.

For the not so faint of heart, here's the fast double hashing of a fragment
that seems to work very well producing good hamming distances over
realistic scenarios and should make it into the next version of draft. I
skip lots of definitions and just show the core Rust snippet. Anyone with
some good foundation in statistics/hashing theory is more than welcome to
criticize. We are not bound by self inverse hashing BTW for building hash
of a fragment. The self-inversion (i.e. XOR) is only of interest later to
allow for quick adjustment of caches over very similar sets.

Distributions of hamming produce weighted averages of about 14 bits
difference for very low entropy checksum (i.e. checksums vary by 1 or 2
bits only for the tested fragments) and about 17 bits of the 32 bits with
checksum being a uniform random number generated for each fragment used. So
it looks like we're having enough entropy playing here.

impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) {
    fn into(self) -> HSNPHash {
        // 16 bit primes to fill in the 64 bits
        let SEED1: u64 = 58417;
        let SEED2: u64 = 47527;
        const BYTEINBITS: usize = size_of::<u8>() * 8;
        let (fragmentid, fragmentcontent) = self;

        let nodeid_iterator = &fragmentid.node.node_id().0;

        let foldednodeid1 = nodeid_iterator
            .iter()
            .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
        let foldednodeid2 = nodeid_iterator
            .iter()
            .rev()
            .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
        // elements to rotate in with their byte size
        let rotate_in = [
            (
                fragmentcontent.checksum.0 as _,
                size_of_val(&fragmentcontent.checksum.0),
            ),
            (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)),
            (
                fragmentid.fragmentnr.0 as _,
                size_of_val(&fragmentid.fragmentnr.0),
            ),
            (
                fragmentcontent.seqnr.0 as _,
                size_of_val(&fragmentcontent.seqnr.0),
            ),
        ];

        let extendedhash1 = rotate_in
            .into_iter()
            .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * BYTEINBITS)
as _) ^ v);
        let extendedhash2 = rotate_in
            .into_iter()
            .rev()
            .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * BYTEINBITS)
as _) ^ v);

        (extendedhash1 ^ extendedhash2).into()
    }
}



On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote:

> Hi all,
>
> [Thinking out loud, so I'm pretty sure I'll regret sending this email ...]
>
> With regards to hash collision, I'm wondering whether we could benefit
> from using a different hash (method/function) per neighbor, and/or over
> time. (As a priori it's enough for me to detect my LSDB inconsistency with
> a single neighbor)
> This requires more computation but assuming that a significant
> contribution to collision is the final step (from 64 bits to 32 bits),
> rotating a number of bits in one 32-bit int on a per neighbor basis seems
> relatively low cost.
>
> Regards,
> --Bruno
>
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to