Tony, Thanks for the reply and for elaborating.
1. By “per neighbor” I meant per adjacency. I had though about just using a different seed, but I could not convince myself that this would help with collision. XOR being commutative and associative, it looks like one could equally applies the seed at the end (last xor) hence after the collision. That’s why I had proposed something a bit heavier with a slightly different hash function. But I would not pretend being capable of following you. 2. Thanks for doing the math (for me) as the actual number(probability) does matter. Not being familiar with Rust, I’ll wait for the next revision of the draft. That being said, beyond xor, I’m not familiar with hash function and their properties. My point was limited to saying that may be using spatial diversity (my different adjacencies) could help with collision (if we can manage to have different collisions depending on the adjacency.) --Bruno From: Tony Przygienda <[email protected]> Sent: Wednesday, October 8, 2025 9:31 PM To: DECRAENE Bruno INNOV/NET <[email protected]> Cc: lsr <[email protected]> Subject: Re: [Lsr] draft-prz-lsr-hierarchical-snps Bruno, thanks. Multiple things 1. I am not sure what "per neighbor" here means. We could take some random number, advertise that in the hellos and use that as seed. all good except hellos are unreliable. Also, things like "take neighbors address" are difficult to do due to unnumbered links and so on. Ultimately such a seed, unless it varies over time reliably, is just another constant in hashing and I doubt it will improve any probabilities. But, yes, within a system we _could_ vary some constant used to build the hashes regularly (kind of like nonce) and advertise this constant in the HSNP itself. computationally expensive for the neighbor to compare and even if the numbers change slowly we need to cache per neighbor the whole thing (since each neighbor uses a different seed) but yes, it would vastly lower the probability of a "hash clash" of course. 2. To give the math quickly once again. I started with 64 bits (and we could do it) but things were big in terms of packet sizes and I realized with math it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of that) is 16 bits so assuming uniform distribution and independent variables we could say that a HSNP checksum over 4E9 (2^32) fragments statistically has the same "strength" as fletcher over a single fragment. Generally in 1st level HSNP we'll cover about 80 ranges each over max. 60 fragments or so (if I remember the numbers in the draft correctly) so probability is roughly 24E2 / 4E9 of that, 2nd level each range can pack a full packet of L1 ranges so we can say 80 * 80 * 60 fragments which is 38E4 / 4E9. so in plain numbers we are talking in very coarse terms of 0.0000001 and 0.00001 chance of things clashing. Once we move to layer 3 we are in realm of 30E6 fragments database, not something I consider likely but for the sake of argument that's 3E7/4E9 and then we're in 0.001 magnitude, i.e. there is a chance of about 0.1% that two databases containing 30 million fragments end up producing somewhere the same hash for two fragments. Of course it matters a _lot_ where this same hash is produced. If it's in the same L1 range then XOR'ing both of them will basically remove both fragments. If not in the same range it's basically meaningless since range above it will not see the clash anymore. All gets mathematically hairy to analyze and I leave it at that. So if with all this people still think that all necessitates solution 1 with a new random number every couple iterations, that's doable, we can also go to 24 bit HSNP hashes easily. For the not so faint of heart, here's the fast double hashing of a fragment that seems to work very well producing good hamming distances over realistic scenarios and should make it into the next version of draft. I skip lots of definitions and just show the core Rust snippet. Anyone with some good foundation in statistics/hashing theory is more than welcome to criticize. We are not bound by self inverse hashing BTW for building hash of a fragment. The self-inversion (i.e. XOR) is only of interest later to allow for quick adjustment of caches over very similar sets. Distributions of hamming produce weighted averages of about 14 bits difference for very low entropy checksum (i.e. checksums vary by 1 or 2 bits only for the tested fragments) and about 17 bits of the 32 bits with checksum being a uniform random number generated for each fragment used. So it looks like we're having enough entropy playing here. impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) { fn into(self) -> HSNPHash { // 16 bit primes to fill in the 64 bits let SEED1: u64 = 58417; let SEED2: u64 = 47527; const BYTEINBITS: usize = size_of::<u8>() * 8; let (fragmentid, fragmentcontent) = self; let nodeid_iterator = &fragmentid.node.node_id().0; let foldednodeid1 = nodeid_iterator .iter() .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64); let foldednodeid2 = nodeid_iterator .iter() .rev() .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64); // elements to rotate in with their byte size let rotate_in = [ ( fragmentcontent.checksum.0 as _, size_of_val(&fragmentcontent.checksum.0), ), (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)), ( fragmentid.fragmentnr.0 as _, size_of_val(&fragmentid.fragmentnr.0), ), ( fragmentcontent.seqnr.0 as _, size_of_val(&fragmentcontent.seqnr.0), ), ]; let extendedhash1 = rotate_in .into_iter() .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * BYTEINBITS) as _) ^ v); let extendedhash2 = rotate_in .into_iter() .rev() .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * BYTEINBITS) as _) ^ v); (extendedhash1 ^ extendedhash2).into() } } On Wed, Oct 8, 2025 at 4:02 PM <[email protected]<mailto:[email protected]>> wrote: Hi all, [Thinking out loud, so I'm pretty sure I'll regret sending this email ...] With regards to hash collision, I'm wondering whether we could benefit from using a different hash (method/function) per neighbor, and/or over time. (As a priori it's enough for me to detect my LSDB inconsistency with a single neighbor) This requires more computation but assuming that a significant contribution to collision is the final step (from 64 bits to 32 bits), rotating a number of bits in one 32-bit int on a per neighbor basis seems relatively low cost. Regards, --Bruno ____________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. _______________________________________________ Lsr mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> ____________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
