Speaking as WG member: Hey Tony,
I not sure that anyone other than you, and possibly the other authors understand exactly what these graphs depict. I'm thinking that maybe I need the aid of the Rossetta Stone? Thanks, Acee > On Oct 10, 2025, at 2:06 PM, Tony Przygienda <[email protected]> wrote: > > So I ran fletcher instead of the double hashing I was toying around with. > Fletcher shows about same distribution under low entropy scenarios (i.e. > assuming checksums vary by very few bits which is not very likely but then > again) and a more even and steeper distribution under normal entropy > scenarios (i.e. fragment checksums are relatively uniformly distributed) so > AFAIS we should cut over, as Tony Li says, to Fletcher to generate single > fragment hashes that way as a better alternative. Below the graphs. > > > > Double Hashing > > 13.38 ┤ ╭╮ ╭╮ ╭╮ > > 12.35 ┤ ╭╮ ││ ││ ││ ╭╮ > > 11.32 ┤ ╭╮ ││ ││ ││ ││ ││ ╭╮ > > 10.29 ┤ ╭╮ ││ ││ ││ ││ ││ ││ ││ > > 9.26 ┤ ││ ││ ││ ││ │╰╮ │╰╮ │╰╮ ││ ╭╮ > > 8.23 ┤ ╭╮ ││ ││ ││ │╰╮ │ │ │ │ │ │ │╰╮ ││ > > 7.20 ┤ ││ ││ ││ │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮ > > 6.17 ┤ ││ │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮ > > 5.15 ┤ ││ ╭╯ │╭╯ ││ ││ ││ ││ │╭╯ │╭╯ │ │ │ │ │ ╭╮ > > 4.12 ┤ ╭╯╰╮│ ││ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ │╰╮ > > 3.09 ┤ │ ││ ││ ││ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ > > 2.06 ┤ │ ││ ││ ││ ││ ││ ││ ╰╯ ╰╯ ╰╯ ╰╯ ╰─╯ │ ╭─╮ > > 1.03 ┤ │ ││ ││ ╰╯ ╰╯ ╰╯ ╰╯ ╰─╯ ╰╮ > > 0.00 ┼──╯ ╰╯ ╰╯ > ╰──────────────────────────────────────────────────────────────────────── > Log2 Hamming Distribution of Hashes built over > low entropy mean 13.115629835964098 > 13.46 ┤ ╭╮ ╭╮ ╭╮ > > 12.42 ┤ ╭╮ ││ ││ ││ ╭╮ > > 11.39 ┤ ││ ││ ││ ││ ││ > > 10.35 ┤ ╭╮ ││ ││ ││ ││ ││ ╭╮ > > 9.32 ┤ ││ ││ ││ │╰╮ │╰╮ │╰╮ ││ > > 8.28 ┤ ╭╮ ││ ││ │╰╮ │ │ │ │ │ │ │╰╮ ╭╮ > > 7.25 ┤ ││ ││ │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ > > 6.21 ┤ ││ │╰╮ │ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ ╭╮ > > 5.18 ┤ ╭╮ │╰╮╭╯ │╭╯ ││ ││ ││ │╭╯ │ │ │ │ │ │╰╮ > > 4.14 ┤ ││ ╭╯ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ │ │ > > 3.11 ┤ ╭╯╰╮│ ││ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ ╭╮ > > 2.07 ┤ │ ││ ││ ││ ││ ││ ╰╯ ╰╯ ╰╯ ╰╯ ╰─╯ │ │╰╮ > > 1.04 ┤ │ ││ ││ ╰╯ ╰╯ ╰╯ ╰─╯ ╰╮ > > 0.00 ┼──────╯ ╰╯ ╰╯ > ╰──────────────────────────────────────────────────────────────────────── > Log2 Hamming Distribution of Hashes built over > realistic checksums mean 14.770937790157847 > > > versus fletcher hamming distributions > > 13.38 ┤ ╭╮ ╭╮ ╭╮ > > 12.35 ┤ ╭╮ ││ ││ ││ ╭╮ > > 11.32 ┤ ╭╮ ││ ││ ││ ││ ││ ╭╮ > > 10.29 ┤ ╭╮ ││ ││ ││ ││ ││ ││ ││ > > 9.26 ┤ ││ ││ ││ ││ │╰╮ │╰╮ │╰╮ ││ ╭╮ > > 8.23 ┤ ╭╮ ││ ││ ││ │╰╮ │ │ │ │ │ │ │╰╮ ││ > > 7.20 ┤ ││ ││ ││ │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮ > > 6.17 ┤ ││ │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮ > > 5.15 ┤ ││ ╭╯ │╭╯ ││ ││ ││ ││ │╭╯ │╭╯ │ │ │ │ │ ╭╮ > > 4.12 ┤ ╭╯╰╮│ ││ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ │╰╮ > > 3.09 ┤ │ ││ ││ ││ ││ ││ ││ ││ ││ ││ │╭╯ │ │ │ > > 2.06 ┤ │ ││ ││ ││ ││ ││ ││ ╰╯ ╰╯ ╰╯ ╰╯ ╰─╯ │ ╭─╮ > > 1.03 ┤ │ ││ ││ ╰╯ ╰╯ ╰╯ ╰╯ ╰─╯ ╰╮ > > 0.00 ┼──╯ ╰╯ ╰╯ > ╰──────────────────────────────────────────────────────────────────────── > Log2 Hamming Distribution of Hashes built over > low entropy mean 13.115629835964098 > 14.10 ┤ ╭─────────╮ > > 13.10 ┤ ╭───╯ ╰──╮ > > 12.09 ┤ ╭─╯ ╰──╮ > > 11.08 ┤ ╭─╯ ╰─╮ > > 10.07 ┤ ╭─╯ ╰╮ > > 9.07 ┤ ╭╯ ╰╮ > > 8.06 ┤ ╭─╯ ╰─╮ > > 7.05 ┤ ╭─╯ ╰╮ > > 6.04 ┤ ╭╯ ╰╮ > > 5.04 ┤ ╭╯ ╰╮ > > 4.03 ┤ ╭╯ ╰╮ > > 3.02 ┤ ╭╯ ╰╮ > > 2.01 ┤ ╭╯ ╰╮ > > 1.01 ┤ ╭╯ ╰╮ > > -0.00 ┼──────╯ > ╰────────────────────────────────────────────────────────────────────── > Log2 Hamming Distribution of Hashes built over > realistic checksums mean 15.987403280718043 > > Ultimately I ran performance measurements on good Rust implementations for > both over millions of generated fragments and fletcher is on average as fast > as good double hashing although it shows bits more CPU outliers so it doesn't > matter what we'll use in terms of CPU load AFAIS > > If we still want to toy with the idea of _per neighbor_ or _over time > varying_ hashes then it would boil down to the initial constants on fletcher > then AFAIS. Not a fan of it since CPU load goes up with #neighbors and also > possibly caching or otherwise permanent recomputation of things ... > > For the really curious, played with Adler as well, no advantage over simple > fletcher > > -- tony > > > On Thu, Oct 9, 2025 at 5:23 PM Tony Przygienda <[email protected]> wrote: > yeah, I read you and it's an intriguing and helpful idea. > > Simplest form of that would be "roll every 3 times on HSNPs to a new random > seed" which would be advertised in the HSNP header itself. > > viable, heavy on computation and caching > > like I said, 24 bits hash could be a cheaper cpu solution at cost of smaller > coverage due to packet size > > > On Thu, Oct 9, 2025 at 3:28 PM <[email protected]> wrote: > Tony, > Thanks for the reply and for elaborating. > > • > By “per neighbor” I meant per adjacency. > I had though about just using a different seed, but I could not convince > myself that this would help with collision. XOR being commutative and > associative, it looks like one could equally applies the seed at the end > (last xor) hence after the collision. That’s why I had proposed something a > bit heavier with a slightly different hash function. But I would not pretend > being capable of following you. > 2. > Thanks for doing the math (for me) as the actual number(probability) does > matter. > Not being familiar with Rust, I’ll wait for the next revision of the draft. > That being said, beyond xor, I’m not familiar with hash function and their > properties. > My point was limited to saying that may be using spatial diversity (my > different adjacencies) could help with collision (if we can manage to have > different collisions depending on the adjacency.) > --Bruno > From: Tony Przygienda <[email protected]> > Sent: Wednesday, October 8, 2025 9:31 PM > To: DECRAENE Bruno INNOV/NET <[email protected]> > Cc: lsr <[email protected]> > Subject: Re: [Lsr] draft-prz-lsr-hierarchical-snps > Bruno, thanks. > Multiple things > 1. I am not sure what "per neighbor" here means. We could take some random > number, advertise that in the hellos and use that as seed. all good except > hellos are unreliable. Also, things like "take neighbors address" are > difficult to do due to unnumbered links and so on. Ultimately such a seed, > unless it varies over time reliably, is just another constant in hashing and > I doubt it will improve any probabilities. But, yes, within a system we > _could_ vary some constant used to build the hashes regularly (kind of like > nonce) and advertise this constant in the HSNP itself. computationally > expensive for the neighbor to compare and even if the numbers change slowly > we need to cache per neighbor the whole thing (since each neighbor uses a > different seed) but yes, it would vastly lower the probability of a "hash > clash" of course. > 2. To give the math quickly once again. I started with 64 bits (and we could > do it) but things were big in terms of packet sizes and I realized with math > it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of that) is 16 > bits so assuming uniform distribution and independent variables we could say > that a HSNP checksum over 4E9 (2^32) fragments statistically has the same > "strength" as fletcher over a single fragment. Generally in 1st level HSNP > we'll cover about 80 ranges each over max. 60 fragments or so (if I remember > the numbers in the draft correctly) so probability is roughly 24E2 / 4E9 of > that, 2nd level each range can pack a full packet of L1 ranges so we can say > 80 * 80 * 60 fragments which is 38E4 / 4E9. so in plain numbers we are > talking in very coarse terms of 0.0000001 and 0.00001 chance of things > clashing. Once we move to layer 3 we are in realm of 30E6 fragments database, > not something I consider likely but for the sake of argument that's 3E7/4E9 > and then we're in 0.001 magnitude, i.e. there is a chance of about 0.1% that > two databases containing 30 million fragments end up producing somewhere the > same hash for two fragments. Of course it matters a _lot_ where this same > hash is produced. If it's in the same L1 range then XOR'ing both of them will > basically remove both fragments. If not in the same range it's basically > meaningless since range above it will not see the clash anymore. All gets > mathematically hairy to analyze and I leave it at that. So if with all this > people still think that all necessitates solution 1 with a new random number > every couple iterations, that's doable, we can also go to 24 bit HSNP hashes > easily. > For the not so faint of heart, here's the fast double hashing of a fragment > that seems to work very well producing good hamming distances over realistic > scenarios and should make it into the next version of draft. I skip lots of > definitions and just show the core Rust snippet. Anyone with some good > foundation in statistics/hashing theory is more than welcome to criticize. We > are not bound by self inverse hashing BTW for building hash of a fragment. > The self-inversion (i.e. XOR) is only of interest later to allow for quick > adjustment of caches over very similar sets. > Distributions of hamming produce weighted averages of about 14 bits > difference for very low entropy checksum (i.e. checksums vary by 1 or 2 bits > only for the tested fragments) and about 17 bits of the 32 bits with checksum > being a uniform random number generated for each fragment used. So it looks > like we're having enough entropy playing here. > impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) { > fn into(self) -> HSNPHash { > // 16 bit primes to fill in the 64 bits > let SEED1: u64 = 58417; > let SEED2: u64 = 47527; > const BYTEINBITS: usize = size_of::<u8>() * 8; > let (fragmentid, fragmentcontent) = self; > > let nodeid_iterator = &fragmentid.node.node_id().0; > > let foldednodeid1 = nodeid_iterator > .iter() > .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64); > let foldednodeid2 = nodeid_iterator > .iter() > .rev() > .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64); > // elements to rotate in with their byte size > let rotate_in = [ > ( > fragmentcontent.checksum.0 as _, > size_of_val(&fragmentcontent.checksum.0), > ), > (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)), > ( > fragmentid.fragmentnr.0 as _, > size_of_val(&fragmentid.fragmentnr.0), > ), > ( > fragmentcontent.seqnr.0 as _, > size_of_val(&fragmentcontent.seqnr.0), > ), > ]; > > let extendedhash1 = rotate_in > .into_iter() > .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * BYTEINBITS) > as _) ^ v); > let extendedhash2 = rotate_in > .into_iter() > .rev() > .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * BYTEINBITS) > as _) ^ v); > > (extendedhash1 ^ extendedhash2).into() > } > } > On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote: > Hi all, > > [Thinking out loud, so I'm pretty sure I'll regret sending this email ...] > > With regards to hash collision, I'm wondering whether we could benefit from > using a different hash (method/function) per neighbor, and/or over time. (As > a priori it's enough for me to detect my LSDB inconsistency with a single > neighbor) > This requires more computation but assuming that a significant contribution > to collision is the final step (from 64 bits to 32 bits), rotating a number > of bits in one 32-bit int on a per neighbor basis seems relatively low cost. > > Regards, > --Bruno > ____________________________________________________________________________________________________________ > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you. > > _______________________________________________ > Lsr mailing list -- [email protected] > To unsubscribe send an email to [email protected] > ____________________________________________________________________________________________________________ > Ce message et ses pieces jointes peuvent contenir des informations > confidentielles ou privilegiees et ne doivent donc > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu > ce message par erreur, veuillez le signaler > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages > electroniques etant susceptibles d'alteration, > Orange decline toute responsabilite si ce message a ete altere, deforme ou > falsifie. Merci. > > This message and its attachments may contain confidential or privileged > information that may be protected by law; > they should not be distributed, used or copied without authorisation. > If you have received this email in error, please notify the sender and delete > this message and its attachments. > As emails may be altered, Orange is not liable for messages that have been > modified, changed or falsified. > Thank you. > _______________________________________________ > Lsr mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
