Tony,

Thanks for the reply and for elaborating.


  1.
By “per neighbor” I meant per adjacency.
I had though about just using a different seed, but I could not convince myself 
that this would help with collision. XOR being commutative and associative, it 
looks like one could equally applies the seed at the end (last xor) hence after 
the collision. That’s why I had proposed something a bit heavier with a 
slightly different hash function. But I would not pretend being capable of 
following you.

2.
Thanks for doing the math (for me) as the actual number(probability) does 
matter.
Not being familiar with Rust, I’ll wait for the next revision of the draft.

That being said, beyond xor, I’m not familiar with hash function and their 
properties.
My point was limited to saying that may be using spatial diversity (my 
different adjacencies) could help with collision (if we can manage to have 
different collisions depending on the adjacency.)

--Bruno

From: Tony Przygienda <[email protected]>
Sent: Wednesday, October 8, 2025 9:31 PM
To: DECRAENE Bruno INNOV/NET <[email protected]>
Cc: lsr <[email protected]>
Subject: Re: [Lsr] draft-prz-lsr-hierarchical-snps

Bruno, thanks.

Multiple things

1. I am not sure what "per neighbor" here means. We could take some random 
number, advertise that in the hellos and use that as seed. all good except 
hellos are unreliable. Also, things like "take neighbors address" are difficult 
to do due to unnumbered links and so on. Ultimately such a seed, unless it 
varies over time reliably, is just another constant in hashing and I doubt it 
will improve any probabilities. But, yes, within a system we _could_ vary some 
constant used to build the hashes regularly (kind of like nonce) and advertise 
this constant in the HSNP itself. computationally expensive for the neighbor to 
compare and even if the numbers change slowly we need to cache per neighbor the 
whole thing (since each neighbor uses a different seed) but yes, it would 
vastly lower the probability of a "hash clash" of course.
2. To give the math quickly once again. I started with 64 bits (and we could do 
it) but things were big in terms of packet sizes and I realized with math it's 
overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of that) is 16 bits 
so assuming uniform distribution and independent variables we could say that a 
HSNP checksum over 4E9 (2^32) fragments statistically has the same "strength" 
as fletcher over a single fragment. Generally in 1st level HSNP we'll cover 
about 80 ranges each over max. 60 fragments or so (if I remember the numbers in 
the draft correctly) so probability is roughly 24E2 / 4E9 of that, 2nd level 
each range can pack a full packet of L1 ranges so we can say 80 * 80 * 60 
fragments which is 38E4 / 4E9. so in plain numbers we are talking in very 
coarse terms of 0.0000001 and 0.00001 chance of things clashing. Once we move 
to layer 3 we are in realm of 30E6 fragments database, not something I consider 
likely but for the sake of argument that's 3E7/4E9 and then we're in 0.001 
magnitude, i.e. there is a chance of about 0.1% that two databases containing 
30 million fragments end up producing somewhere the same hash for two 
fragments. Of course it matters a _lot_ where this same hash is produced. If 
it's in the same L1 range then XOR'ing both of them will basically remove both 
fragments. If not in the same range it's basically meaningless since range 
above it will not see the clash anymore. All gets mathematically hairy to 
analyze and I leave it at that. So if with all this people still think that all 
necessitates solution 1 with a new random number every couple iterations, 
that's doable, we can also go to 24 bit HSNP hashes easily.

For the not so faint of heart, here's the fast double hashing of a fragment 
that seems to work very well producing good hamming distances over realistic 
scenarios and should make it into the next version of draft. I skip lots of 
definitions and just show the core Rust snippet. Anyone with some good 
foundation in statistics/hashing theory is more than welcome to criticize. We 
are not bound by self inverse hashing BTW for building hash of a fragment. The 
self-inversion (i.e. XOR) is only of interest later to allow for quick 
adjustment of caches over very similar sets.

Distributions of hamming produce weighted averages of about 14 bits difference 
for very low entropy checksum (i.e. checksums vary by 1 or 2 bits only for the 
tested fragments) and about 17 bits of the 32 bits with checksum being a 
uniform random number generated for each fragment used. So it looks like we're 
having enough entropy playing here.

impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) {
    fn into(self) -> HSNPHash {
        // 16 bit primes to fill in the 64 bits
        let SEED1: u64 = 58417;
        let SEED2: u64 = 47527;
        const BYTEINBITS: usize = size_of::<u8>() * 8;
        let (fragmentid, fragmentcontent) = self;

        let nodeid_iterator = &fragmentid.node.node_id().0;

        let foldednodeid1 = nodeid_iterator
            .iter()
            .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
        let foldednodeid2 = nodeid_iterator
            .iter()
            .rev()
            .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
        // elements to rotate in with their byte size
        let rotate_in = [
            (
                fragmentcontent.checksum.0 as _,
                size_of_val(&fragmentcontent.checksum.0),
            ),
            (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)),
            (
                fragmentid.fragmentnr.0 as _,
                size_of_val(&fragmentid.fragmentnr.0),
            ),
            (
                fragmentcontent.seqnr.0 as _,
                size_of_val(&fragmentcontent.seqnr.0),
            ),
        ];

        let extendedhash1 = rotate_in
            .into_iter()
            .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * BYTEINBITS) as 
_) ^ v);
        let extendedhash2 = rotate_in
            .into_iter()
            .rev()
            .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * BYTEINBITS) as 
_) ^ v);

        (extendedhash1 ^ extendedhash2).into()
    }
}



On Wed, Oct 8, 2025 at 4:02 PM 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,

[Thinking out loud, so I'm pretty sure I'll regret sending this email ...]

With regards to hash collision, I'm wondering whether we could benefit from 
using a different hash (method/function) per neighbor, and/or over time. (As a 
priori it's enough for me to detect my LSDB inconsistency with a single 
neighbor)
This requires more computation but assuming that a significant contribution to 
collision is the final step (from 64 bits to 32 bits), rotating a number of 
bits in one 32-bit int on a per neighbor basis seems relatively low cost.

Regards,
--Bruno
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

_______________________________________________
Lsr mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>
____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.
_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to