[Lsr] Re: draft-prz-lsr-hierarchical-snps

Tony Przygienda Sat, 18 Oct 2025 01:49:47 -0700

Acee, sure, let me give a small rundown

a) hamming distance is probably something people know


https://en.wikipedia.org/wiki/Hamming_distance

b) I;m taking something like 1.5M of fragment descritiptions that need to
be hashed (i.e. seq#, csum, nodeid, pnodeid, frag#) and store resulting
hashes (fletcher or double hash). Then I'm going over the cross product and
sum up hamming distances

c) log2 is the scale of the counters to make it more readable

c) the graph shows the distribution of hamming distances over the matrix.
It basically says that in most cases we find ~16 bits difference in the 32
bits (weighted average)

the higher the average hamming distance the better of course the quality of
the hashing function.

The first graph looks at a low entropy scenario, i.e. the checksums of the
fragment are not uniformly distributed but basically a running counter,
i.e. very low hamming distance from fragment to fragment. This is basically
a simulation to ensure we don't get collisions in such "low entropy
scenarios"

I hope that explains how I measure the quality of hashing here

The graphs need be viewed in fixed-width font in case google mail makes a
hash of things ;-)

here's how I generate the fragments

As you see, to be realistic I keep the node IDs very similar but seqnr and
checksums are allowed to diverge which is more of a common picture

fn generate_fragments() -> HashSet<(SharedFragmentID, FragmentContent)> {
    let mut r = rand::rng();
    let mut nid = [ 49u8, 0, 0, 0, 0, 0 ];
    let r : HashSet<_> = (0 .. 200)
        .cartesian_product(
            (0 .. 100)
                .cartesian_product(
                     0 .. 128

                )
        )
        .map(|v | {
            let (b1, (b2, b3)) = v;
            nid[4] = b1 as _;
            nid[5] = b2 as _;
            let n : NodeID = NodeID(nid.clone());

            (
                SharedFragmentID {
                    node: n.into(),
                    pnode: Default::default(),
                    fragmentnr: FragmentNr(b3),
                },
                FragmentContent {
                    seqnr: FragmentSequenceNr(r.random_range(0 .. 65535)),
                    checksum: FragmentCheckSum(r.random_range(0 .. 65535)),
                }
                )
        } )
        .collect();

    println!("Generated fragments: {}", r.len());
    r
}



-- tony




On Sat, Oct 11, 2025 at 9:38 PM Acee Lindem <[email protected]> wrote:

> Speaking as WG member:
>
> Hey Tony,
>
> I not sure that anyone other than you, and possibly the other authors
> understand exactly what these graphs depict.
>
> I'm thinking that maybe I need the aid of the Rossetta Stone?
>
> Thanks,
> Acee
>
> > On Oct 10, 2025, at 2:06 PM, Tony Przygienda <[email protected]>
> wrote:
> >
> > So I ran fletcher instead of the double hashing I was toying around
> with. Fletcher shows about same distribution under low entropy scenarios
> (i.e. assuming checksums vary by very few bits which is not very likely but
> then again) and a more even and steeper distribution under normal entropy
> scenarios (i.e. fragment checksums are relatively uniformly distributed) so
> AFAIS we should cut over, as Tony Li says, to Fletcher to generate single
> fragment hashes that way as a better alternative. Below the graphs.
> >
> >
> >
> > Double Hashing
> >
> > 13.38 ┤                   ╭╮  ╭╮  ╭╮
>
> >  12.35 ┤               ╭╮  ││  ││  ││  ╭╮
>
> >  11.32 ┤           ╭╮  ││  ││  ││  ││  ││  ╭╮
>
> >  10.29 ┤       ╭╮  ││  ││  ││  ││  ││  ││  ││
>
> >   9.26 ┤       ││  ││  ││  ││  │╰╮ │╰╮ │╰╮ ││  ╭╮
>
> >   8.23 ┤   ╭╮  ││  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ││
>
> >   7.20 ┤   ││  ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮
>
> >   6.17 ┤   ││  │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮
>
> >   5.15 ┤   ││ ╭╯ │╭╯ ││  ││  ││  ││  │╭╯ │╭╯ │ │ │ │ │ ╭╮
>
> >   4.12 ┤  ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │╰╮
>
> >   3.09 ┤  │  ││  ││  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │
>
> >   2.06 ┤  │  ││  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ ╭─╮
>
> >   1.03 ┤  │  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮
>
> >   0.00 ┼──╯  ╰╯  ╰╯
>  ╰────────────────────────────────────────────────────────────────────────
> >                                Log2 Hamming Distribution of Hashes built
> over low entropy mean 13.115629835964098
> >  13.46 ┤                       ╭╮  ╭╮  ╭╮
>
> >  12.42 ┤                   ╭╮  ││  ││  ││  ╭╮
>
> >  11.39 ┤                   ││  ││  ││  ││  ││
>
> >  10.35 ┤               ╭╮  ││  ││  ││  ││  ││  ╭╮
>
> >   9.32 ┤               ││  ││  ││  │╰╮ │╰╮ │╰╮ ││
>
> >   8.28 ┤           ╭╮  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ╭╮
>
> >   7.25 ┤           ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮
>
> >   6.21 ┤           ││  │╰╮ │ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ ╭╮
>
> >   5.18 ┤       ╭╮  │╰╮╭╯ │╭╯ ││  ││  ││  │╭╯ │ │ │ │ │ │╰╮
>
> >   4.14 ┤       ││ ╭╯ ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │ │
>
> >   3.11 ┤      ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ ╭╮
>
> >   2.07 ┤      │  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ │╰╮
>
> >   1.04 ┤      │  ││  ││  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮
>
> >   0.00 ┼──────╯  ╰╯  ╰╯
>  ╰────────────────────────────────────────────────────────────────────────
> >                            Log2 Hamming Distribution of Hashes built
> over realistic checksums mean 14.770937790157847
> >
> >
> > versus fletcher hamming distributions
> >
> > 13.38 ┤                   ╭╮  ╭╮  ╭╮
>
> >  12.35 ┤               ╭╮  ││  ││  ││  ╭╮
>
> >  11.32 ┤           ╭╮  ││  ││  ││  ││  ││  ╭╮
>
> >  10.29 ┤       ╭╮  ││  ││  ││  ││  ││  ││  ││
>
> >   9.26 ┤       ││  ││  ││  ││  │╰╮ │╰╮ │╰╮ ││  ╭╮
>
> >   8.23 ┤   ╭╮  ││  ││  ││  │╰╮ │ │ │ │ │ │ │╰╮ ││
>
> >   7.20 ┤   ││  ││  ││  │╰╮ │ │ │ │ │ │ │ │ │ │ │╰╮ ╭╮
>
> >   6.17 ┤   ││  │╰╮ │╰╮╭╯ │╭╯ │╭╯ │╭╯ │ │ │ │ │ │ │ │╰╮
>
> >   5.15 ┤   ││ ╭╯ │╭╯ ││  ││  ││  ││  │╭╯ │╭╯ │ │ │ │ │ ╭╮
>
> >   4.12 ┤  ╭╯╰╮│  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │ │╰╮
>
> >   3.09 ┤  │  ││  ││  ││  ││  ││  ││  ││  ││  ││  │╭╯ │ │ │
>
> >   2.06 ┤  │  ││  ││  ││  ││  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯  ╰─╯ │ ╭─╮
>
> >   1.03 ┤  │  ││  ││  ╰╯  ╰╯  ╰╯  ╰╯                      ╰─╯ ╰╮
>
> >   0.00 ┼──╯  ╰╯  ╰╯
>  ╰────────────────────────────────────────────────────────────────────────
> >                                Log2 Hamming Distribution of Hashes built
> over low entropy mean 13.115629835964098
> >  14.10 ┤                           ╭─────────╮
>
> >  13.10 ┤                       ╭───╯         ╰──╮
>
> >  12.09 ┤                     ╭─╯                ╰──╮
>
> >  11.08 ┤                   ╭─╯                     ╰─╮
>
> >  10.07 ┤                 ╭─╯                         ╰╮
>
> >   9.07 ┤                ╭╯                            ╰╮
>
> >   8.06 ┤              ╭─╯                              ╰─╮
>
> >   7.05 ┤            ╭─╯                                  ╰╮
>
> >   6.04 ┤           ╭╯                                     ╰╮
>
> >   5.04 ┤          ╭╯                                       ╰╮
>
> >   4.03 ┤         ╭╯                                         ╰╮
>
> >   3.02 ┤        ╭╯                                           ╰╮
>
> >   2.01 ┤       ╭╯                                             ╰╮
>
> >   1.01 ┤      ╭╯                                               ╰╮
>
> >  -0.00 ┼──────╯
>  ╰──────────────────────────────────────────────────────────────────────
> >                            Log2 Hamming Distribution of Hashes built
> over realistic checksums mean 15.987403280718043
> >
> > Ultimately I ran performance measurements on good Rust implementations
> for both over millions of generated fragments and fletcher is on average as
> fast as good double hashing although it shows bits more CPU outliers so it
> doesn't matter what we'll use in terms of CPU load AFAIS
> >
> > If we still want to toy with the idea of _per neighbor_ or _over time
> varying_ hashes then it would boil down to the initial constants on
> fletcher then AFAIS. Not a fan of it since CPU load goes up with #neighbors
> and also possibly caching or otherwise permanent recomputation of things
> ...
> >
> > For the really curious, played with Adler as well, no advantage over
> simple fletcher
> >
> > -- tony
> >
> >
> > On Thu, Oct 9, 2025 at 5:23 PM Tony Przygienda <[email protected]>
> wrote:
> > yeah, I read you and it's an intriguing and helpful idea.
> >
> > Simplest form of that would be "roll every 3 times on HSNPs to a new
> random seed" which would be advertised in the HSNP header itself.
> >
> > viable, heavy on computation and caching
> >
> > like I said, 24 bits hash could be a cheaper cpu solution at cost of
> smaller coverage due to packet size
> >
> >
> > On Thu, Oct 9, 2025 at 3:28 PM <[email protected]> wrote:
> > Tony,
> >  Thanks for the reply and for elaborating.
> >
> >     •
> > By “per neighbor” I meant per adjacency.
> > I had though about just using a different seed, but I could not convince
> myself that this would help with collision. XOR being commutative and
> associative, it looks like one could equally applies the seed at the end
> (last xor) hence after the collision. That’s why I had proposed something a
> bit heavier with a slightly different hash function. But I would not
> pretend being capable of following you.
> >  2.
> > Thanks for doing the math (for me) as the actual number(probability)
> does matter.
> > Not being familiar with Rust, I’ll wait for the next revision of the
> draft.
> >  That being said, beyond xor, I’m not familiar with hash function and
> their properties.
> > My point was limited to saying that may be using spatial diversity (my
> different adjacencies) could help with collision (if we can manage to have
> different collisions depending on the adjacency.)
> >  --Bruno
> >   From: Tony Przygienda <[email protected]>
> > Sent: Wednesday, October 8, 2025 9:31 PM
> > To: DECRAENE Bruno INNOV/NET <[email protected]>
> > Cc: lsr <[email protected]>
> > Subject: Re: [Lsr] draft-prz-lsr-hierarchical-snps
> >   Bruno, thanks.
> >   Multiple things
> >   1. I am not sure what "per neighbor" here means. We could take some
> random number, advertise that in the hellos and use that as seed. all good
> except hellos are unreliable. Also, things like "take neighbors address"
> are difficult to do due to unnumbered links and so on. Ultimately such a
> seed, unless it varies over time reliably, is just another constant in
> hashing and I doubt it will improve any probabilities. But, yes, within a
> system we _could_ vary some constant used to build the hashes regularly
> (kind of like nonce) and advertise this constant in the HSNP itself.
> computationally expensive for the neighbor to compare and even if the
> numbers change slowly we need to cache per neighbor the whole thing (since
> each neighbor uses a different seed) but yes, it would vastly lower the
> probability of a "hash clash" of course.
> > 2. To give the math quickly once again. I started with 64 bits (and we
> could do it) but things were big in terms of packet sizes and I realized
> with math it's overkill AFAIS. Today fletcher (AFAIR ISIS uses a variant of
> that) is 16 bits so assuming uniform distribution and independent variables
> we could say that a HSNP checksum over 4E9 (2^32) fragments statistically
> has the same "strength" as fletcher over a single fragment. Generally in
> 1st level HSNP we'll cover about 80 ranges each over max. 60 fragments or
> so (if I remember the numbers in the draft correctly) so probability is
> roughly 24E2 / 4E9 of that, 2nd level each range can pack a full packet of
> L1 ranges so we can say 80 * 80 * 60 fragments which is 38E4 / 4E9. so in
> plain numbers we are talking in very coarse terms of 0.0000001 and 0.00001
> chance of things clashing. Once we move to layer 3 we are in realm of 30E6
> fragments database, not something I consider likely but for the sake of
> argument that's 3E7/4E9 and then we're in 0.001 magnitude, i.e. there is a
> chance of about 0.1% that two databases containing 30 million fragments end
> up producing somewhere the same hash for two fragments. Of course it
> matters a _lot_ where this same hash is produced. If it's in the same L1
> range then XOR'ing both of them will basically remove both fragments. If
> not in the same range it's basically meaningless since range above it will
> not see the clash anymore. All gets mathematically hairy to analyze and I
> leave it at that. So if with all this people still think that all
> necessitates solution 1 with a new random number every couple iterations,
> that's doable, we can also go to 24 bit HSNP hashes easily.
> >   For the not so faint of heart, here's the fast double hashing of a
> fragment that seems to work very well producing good hamming distances over
> realistic scenarios and should make it into the next version of draft. I
> skip lots of definitions and just show the core Rust snippet. Anyone with
> some good foundation in statistics/hashing theory is more than welcome to
> criticize. We are not bound by self inverse hashing BTW for building hash
> of a fragment. The self-inversion (i.e. XOR) is only of interest later to
> allow for quick adjustment of caches over very similar sets.
> >   Distributions of hamming produce weighted averages of about 14 bits
> difference for very low entropy checksum (i.e. checksums vary by 1 or 2
> bits only for the tested fragments) and about 17 bits of the 32 bits with
> checksum being a uniform random number generated for each fragment used. So
> it looks like we're having enough entropy playing here.
> >   impl Into<HSNPHash> for (&SharedFragmentID, &FragmentContent) {
> >     fn into(self) -> HSNPHash {
> >         // 16 bit primes to fill in the 64 bits
> >         let SEED1: u64 = 58417;
> >         let SEED2: u64 = 47527;
> >         const BYTEINBITS: usize = size_of::<u8>() * 8;
> >         let (fragmentid, fragmentcontent) = self;
> >
> >         let nodeid_iterator = &fragmentid.node.node_id().0;
> >
> >         let foldednodeid1 = nodeid_iterator
> >             .iter()
> >             .fold(SEED1, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as
> u64);
> >         let foldednodeid2 = nodeid_iterator
> >             .iter()
> >             .rev()
> >             .fold(SEED2, |p, n| p.rotate_left(BYTEINBITS as _) ^ *n as
> u64);
> >         // elements to rotate in with their byte size
> >         let rotate_in = [
> >             (
> >                 fragmentcontent.checksum.0 as _,
> >                 size_of_val(&fragmentcontent.checksum.0),
> >             ),
> >             (fragmentid.pnode.0 as u64,
> size_of_val(&fragmentid.pnode.0)),
> >             (
> >                 fragmentid.fragmentnr.0 as _,
> >                 size_of_val(&fragmentid.fragmentnr.0),
> >             ),
> >             (
> >                 fragmentcontent.seqnr.0 as _,
> >                 size_of_val(&fragmentcontent.seqnr.0),
> >             ),
> >         ];
> >
> >         let extendedhash1 = rotate_in
> >             .into_iter()
> >             .fold(foldednodeid1, |p, (v, s)| p.rotate_left((s *
> BYTEINBITS) as _) ^ v);
> >         let extendedhash2 = rotate_in
> >             .into_iter()
> >             .rev()
> >             .fold(foldednodeid2, |p, (v, s)| p.rotate_left((s *
> BYTEINBITS) as _) ^ v);
> >
> >         (extendedhash1 ^ extendedhash2).into()
> >     }
> > }
> >       On Wed, Oct 8, 2025 at 4:02 PM <[email protected]> wrote:
> > Hi all,
> >
> > [Thinking out loud, so I'm pretty sure I'll regret sending this email
> ...]
> >
> > With regards to hash collision, I'm wondering whether we could benefit
> from using a different hash (method/function) per neighbor, and/or over
> time. (As a priori it's enough for me to detect my LSDB inconsistency with
> a single neighbor)
> > This requires more computation but assuming that a significant
> contribution to collision is the final step (from 64 bits to 32 bits),
> rotating a number of bits in one 32-bit int on a per neighbor basis seems
> relatively low cost.
> >
> > Regards,
> > --Bruno
> >
> ____________________________________________________________________________________________________________
> > Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> > Orange decline toute responsabilite si ce message a ete altere, deforme
> ou falsifie. Merci.
> >
> > This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> > they should not be distributed, used or copied without authorisation.
> > If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> > As emails may be altered, Orange is not liable for messages that have
> been modified, changed or falsified.
> > Thank you.
> >
> > _______________________________________________
> > Lsr mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
> ____________________________________________________________________________________________________________
> > Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> > pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> > a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> > Orange decline toute responsabilite si ce message a ete altere, deforme
> ou falsifie. Merci.
> >
> > This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> > they should not be distributed, used or copied without authorisation.
> > If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> > As emails may be altered, Orange is not liable for messages that have
> been modified, changed or falsified.
> > Thank you.
> > _______________________________________________
> > Lsr mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
>
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: draft-prz-lsr-hierarchical-snps

Reply via email to