Speaking as WG member:

Hey Tony, 

I not sure that anyone other than you, and possibly the other authors 
understand exactly what these graphs depict.

I'm thinking that maybe I need the aid of the Rossetta Stone? 

Thanks,
Acee 

> On Oct 10, 2025, at 2:06 PM, Tony Przygienda <[email protected]> wrote:
> 
> So I ran fletcher instead of the double hashing I was toying around with. 
> Fletcher shows about same distribution under low entropy scenarios (i.e. 
> assuming checksums vary by very few bits which is not very likely but then 
> again) and a more even and steeper distribution under normal entropy 
> scenarios (i.e. fragment checksums are relatively uniformly distributed) so 
> AFAIS we should cut over, as Tony Li says, to Fletcher to generate single 
> fragment hashes that way as a better alternative. Below the graphs. 
> 
> 
> 
> Double Hashing 
> 
> 13.38 ┤                   ╭╮  ╭╮  ╭╮                                          
>                                                          
>  12.35 ┤               ╭╮  ││  ││  ││  ╭╮                                     
>                                                           
>  11.32 ┤           ╭╮  ││  ││  ││  ││  ││  ╭╮                                 
>                                                           
>  10.29 ┤       ╭╮  ││  ││  ││  ││  ││  ││  ││                                 
>                                                           
>   9.26 ┤       ││  ││  ││  ││  │╰╮ │╰╮ │╰╮ ││  ╭╮                             
>                                                           
>   8.23 ┤   ╭╮  ││  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ││                             
>                                                           
>   7.20 ┤   ││  ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮                         
>                                                           
>   6.17 ┤   ││  │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮                        
>                                                           
>   5.15 ┤   ││ ╭╯ │╭╯ ││  ││  ││  ││  │╭╯ │╭╯ │ │ │ │ │ ╭╮                     
>                                                           
>   4.12 ┤  ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │╰╮                    
>                                                           
>   3.09 ┤  │  ││  ││  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │                    
>                                                           
>   2.06 ┤  │  ││  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ ╭─╮                
>                                                           
>   1.03 ┤  │  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮               
>                                                           
>   0.00 ┼──╯  ╰╯  ╰╯                                           
> ╰────────────────────────────────────────────────────────────────────────
>                                Log2 Hamming Distribution of Hashes built over 
> low entropy mean 13.115629835964098
>  13.46 ┤                       ╭╮  ╭╮  ╭╮                                     
>                                                           
>  12.42 ┤                   ╭╮  ││  ││  ││  ╭╮                                 
>                                                           
>  11.39 ┤                   ││  ││  ││  ││  ││                                 
>                                                           
>  10.35 ┤               ╭╮  ││  ││  ││  ││  ││  ╭╮                             
>                                                           
>   9.32 ┤               ││  ││  ││  │╰╮ │╰╮ │╰╮ ││                             
>                                                           
>   8.28 ┤           ╭╮  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ╭╮                         
>                                                           
>   7.25 ┤           ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮                        
>                                                           
>   6.21 ┤           ││  │╰╮ │ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ ╭╮                     
>                                                           
>   5.18 ┤       ╭╮  │╰╮╭╯ │╭╯ ││  ││  ││  │╭╯ │ │ │ │ │ │╰╮                    
>                                                           
>   4.14 ┤       ││ ╭╯ ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │ │                    
>                                                           
>   3.11 ┤      ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ ╭╮                 
>                                                           
>   2.07 ┤      │  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ │╰╮                
>                                                           
>   1.04 ┤      │  ││  ││  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮               
>                                                           
>   0.00 ┼──────╯  ╰╯  ╰╯                                       
> ╰────────────────────────────────────────────────────────────────────────
>                            Log2 Hamming Distribution of Hashes built over 
> realistic checksums mean 14.770937790157847
> 
> 
> versus fletcher hamming distributions 
> 
> 13.38 ┤                   ╭╮  ╭╮  ╭╮                                          
>                                                          
>  12.35 ┤               ╭╮  ││  ││  ││  ╭╮                                     
>                                                           
>  11.32 ┤           ╭╮  ││  ││  ││  ││  ││  ╭╮                                 
>                                                           
>  10.29 ┤       ╭╮  ││  ││  ││  ││  ││  ││  ││                                 
>                                                           
>   9.26 ┤       ││  ││  ││  ││  │╰╮ │╰╮ │╰╮ ││  ╭╮                             
>                                                           
>   8.23 ┤   ╭╮  ││  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ││                             
>                                                           
>   7.20 ┤   ││  ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮                         
>                                                           
>   6.17 ┤   ││  │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮                        
>                                                           
>   5.15 ┤   ││ ╭╯ │╭╯ ││  ││  ││  ││  │╭╯ │╭╯ │ │ │ │ │ ╭╮                     
>                                                           
>   4.12 ┤  ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │╰╮                    
>                                                           
>   3.09 ┤  │  ││  ││  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │                    
>                                                           
>   2.06 ┤  │  ││  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ ╭─╮                
>                                                           
>   1.03 ┤  │  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮               
>                                                           
>   0.00 ┼──╯  ╰╯  ╰╯                                           
> ╰────────────────────────────────────────────────────────────────────────
>                                Log2 Hamming Distribution of Hashes built over 
> low entropy mean 13.115629835964098
>  14.10 ┤                           ╭─────────╮                                
>                                                           
>  13.10 ┤                       ╭───╯         ╰──╮                             
>                                                           
>  12.09 ┤                     ╭─╯                ╰──╮                          
>                                                           
>  11.08 ┤                   ╭─╯                     ╰─╮                        
>                                                           
>  10.07 ┤                 ╭─╯                         ╰╮                       
>                                                           
>   9.07 ┤                ╭╯                            ╰╮                      
>                                                           
>   8.06 ┤              ╭─╯                              ╰─╮                    
>                                                           
>   7.05 ┤            ╭─╯                                  ╰╮                   
>                                                           
>   6.04 ┤           ╭╯                                     ╰╮                  
>                                                           
>   5.04 ┤          ╭╯                                       ╰╮                 
>                                                           
>   4.03 ┤         ╭╯                                         ╰╮                
>                                                           
>   3.02 ┤        ╭╯                                           ╰╮               
>                                                           
>   2.01 ┤       ╭╯                                             ╰╮              
>                                                           
>   1.01 ┤      ╭╯                                               ╰╮             
>                                                           
>  -0.00 ┼──────╯                                                 
> ╰──────────────────────────────────────────────────────────────────────
>                            Log2 Hamming Distribution of Hashes built over 
> realistic checksums mean 15.987403280718043 
> 
> Ultimately I ran performance measurements on good Rust implementations for 
> both over millions of generated fragments and fletcher is on average as fast 
> as good double hashing although it shows bits more CPU outliers so it doesn't 
> matter what we'll use in terms of CPU load AFAIS
> 
> If we still want to toy with the idea of _per neighbor_ or _over time 
> varying_ hashes then it would boil down to the initial constants on fletcher 
> then AFAIS. Not a fan of it since CPU load goes up with #neighbors and also 
> possibly caching or otherwise permanent recomputation of things ... 
> 
> For the really curious, played with Adler as well, no advantage over simple 
> fletcher
> 
> -- tony 
> 
> 
> On Thu, Oct 9, 2025 at 5:23 PM Tony Przygienda <[email protected]> wrote:
> yeah, I read you and it's an intriguing and helpful idea. 
> 
> Simplest form of that would be "roll every 3 times on HSNPs to a new random 
> seed" which would be advertised in the HSNP header itself. 
> 
> viable, heavy on computation and caching
> 
> like I said, 24 bits hash could be a cheaper cpu solution at cost of smaller 
> coverage due to packet size 
> 
> 
> On Thu, Oct 9, 2025 at 3:28 PM <[email protected]> wrote:
> Tony,
>  Thanks for the reply and for elaborating.
>  
>     •  
> By “per neighbor” I meant per adjacency.
> I had though about just using a different seed, but I could not convince 
> myself that this would help with collision. XOR being commutative and 
> associative, it looks like one could equally applies the seed at the end 
> (last xor) hence after the collision. That’s why I had proposed something a 
> bit heavier with a slightly different hash function. But I would not pretend 
> being capable of following you.
>  2.
> Thanks for doing the math (for me) as the actual number(probability) does 
> matter.
> Not being familiar with Rust, I’ll wait for the next revision of the draft.
>  That being said, beyond xor, I’m not familiar with hash function and their 
> properties.
> My point was limited to saying that may be using spatial diversity (my 
> different adjacencies) could help with collision (if we can manage to have 
> different collisions depending on the adjacency.)
>  --Bruno 
>   From: Tony Przygienda <[email protected]> 
> Sent: Wednesday, October 8, 2025 9:31 PM
> To: DECRAENE Bruno INNOV/NET <[email protected]>
> Cc: lsr <[email protected]>
> Subject: Re: [Lsr] draft-prz-lsr-hierarchical-snps
>   Bruno, thanks. 
>   Multiple things 
>   1. I am not sure what "per neighbor" here means. We could take some random 
> number, advertise that in the hellos and use that as seed. all good except 
> hellos are unreliable. Also, things like "take neighbors address" are 
> difficult to do due to unnumbered links and so on. Ultimately such a seed, 
> unless it varies over time reliably, is just another constant in hashing and 
> I doubt it will improve any probabilities. But, yes, within a system we 
> _could_ vary some constant used to build the hashes regularly (kind of like 
> nonce) and advertise this constant in the HSNP itself. computationally 
> expensive for the neighbor to compare and even if the numbers change slowly 
> we need to cache per neighbor the whole thing (since each neighbor uses a 
> different seed) but yes, it would vastly lower the probability of a "hash 
> clash" of course. 
> 2. To give the math quickly once again. I started with 64 bits (and we could 
> do it) but things were big in terms of packet sizes and I realized with math 
> it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of that) is 16 
> bits so assuming uniform distribution and independent variables we could say 
> that a HSNP checksum over 4E9 (2^32) fragments statistically has the same 
> "strength" as fletcher over a single fragment. Generally in 1st level HSNP 
> we'll cover about 80 ranges each over max. 60 fragments or so (if I remember 
> the numbers in the draft correctly) so probability is roughly 24E2 / 4E9 of 
> that, 2nd level each range can pack a full packet of L1 ranges so we can say 
> 80 * 80 * 60 fragments which is 38E4 / 4E9. so in plain numbers we are 
> talking in very coarse terms of 0.0000001 and 0.00001 chance of things 
> clashing. Once we move to layer 3 we are in realm of 30E6 fragments database, 
> not something I consider likely but for the sake of argument that's 3E7/4E9 
> and then we're in 0.001 magnitude, i.e. there is a chance of about 0.1% that 
> two databases containing 30 million fragments end up producing somewhere the 
> same hash for two fragments. Of course it matters a _lot_ where this same 
> hash is produced. If it's in the same L1 range then XOR'ing both of them will 
> basically remove both fragments. If not in the same range it's basically 
> meaningless since range above it will not see the clash anymore. All gets 
> mathematically hairy to analyze and I leave it at that. So if with all this 
> people still think that all necessitates solution 1 with a new random number 
> every couple iterations, that's doable, we can also go to 24 bit HSNP hashes 
> easily. 
>   For the not so faint of heart, here's the fast double hashing of a fragment 
> that seems to work very well producing good hamming distances over realistic 
> scenarios and should make it into the next version of draft. I skip lots of 
> definitions and just show the core Rust snippet. Anyone with some good 
> foundation in statistics/hashing theory is more than welcome to criticize. We 
> are not bound by self inverse hashing BTW for building hash of a fragment. 
> The self-inversion (i.e. XOR) is only of interest later to allow for quick 
> adjustment of caches over very similar sets. 
>   Distributions of hamming produce weighted averages of about 14 bits 
> difference for very low entropy checksum (i.e. checksums vary by 1 or 2 bits 
> only for the tested fragments) and about 17 bits of the 32 bits with checksum 
> being a uniform random number generated for each fragment used. So it looks 
> like we're having enough entropy playing here. 
>   impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) {
>     fn into(self) -> HSNPHash {
>         // 16 bit primes to fill in the 64 bits
>         let SEED1: u64 = 58417;
>         let SEED2: u64 = 47527;
>         const BYTEINBITS: usize = size_of::<u8>() * 8;
>         let (fragmentid, fragmentcontent) = self;
> 
>         let nodeid_iterator = &fragmentid.node.node_id().0;
> 
>         let foldednodeid1 = nodeid_iterator
>             .iter()
>             .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
>         let foldednodeid2 = nodeid_iterator
>             .iter()
>             .rev()
>             .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as u64);
>         // elements to rotate in with their byte size
>         let rotate_in = [
>             (
>                 fragmentcontent.checksum.0 as _,
>                 size_of_val(&fragmentcontent.checksum.0),
>             ),
>             (fragmentid.pnode.0 as u64, size_of_val(&fragmentid.pnode.0)),
>             (
>                 fragmentid.fragmentnr.0 as _,
>                 size_of_val(&fragmentid.fragmentnr.0),
>             ),
>             (
>                 fragmentcontent.seqnr.0 as _,
>                 size_of_val(&fragmentcontent.seqnr.0),
>             ),
>         ];
> 
>         let extendedhash1 = rotate_in
>             .into_iter()
>             .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s * BYTEINBITS) 
> as _) ^ v);
>         let extendedhash2 = rotate_in
>             .into_iter()
>             .rev()
>             .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s * BYTEINBITS) 
> as _) ^ v);
> 
>         (extendedhash1 ^ extendedhash2).into()
>     }
> }
>       On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote:
> Hi all,
> 
> [Thinking out loud, so I'm pretty sure I'll regret sending this email ...]
> 
> With regards to hash collision, I'm wondering whether we could benefit from 
> using a different hash (method/function) per neighbor, and/or over time. (As 
> a priori it's enough for me to detect my LSDB inconsistency with a single 
> neighbor)
> This requires more computation but assuming that a significant contribution 
> to collision is the final step (from 64 bits to 32 bits), rotating a number 
> of bits in one 32-bit int on a per neighbor basis seems relatively low cost.
> 
> Regards,
> --Bruno
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
> 
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> ____________________________________________________________________________________________________________
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
> 
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to