I probably misunderstood your "level" comment. If your observation
pertained to the ISIS Level, then yes, there will be a HSNP for L1 and one
for L2 just like CSNP

--- tony

On Wed, Mar 11, 2026 at 8:02 AM Przygienda, Tony <antoni.przygienda=
[email protected]> wrote:

> Hey Les, thanks for detailed read, lots of valid, productive comments,
> inline on prz>
>
> Sent from Outlook for Mac
> *From: *Les Ginsberg (ginsberg) <[email protected]>
> *Date: *Wednesday, 11 March 2026 at 07:28
> *To: *[email protected] <
> [email protected]>
> *Cc: *lsr <[email protected]>
> *Subject: *Comments on draft-prz-lsr-hierarchical-snps-01
>
>
>
> But I preface my remarks by saying that the document needs to be more
> precise as regards the specification of the new PDUs and new TLVs you wish
> to define. In its current form, these elements are only "described" - but
> not "specified". In some cases, which I will comment on below, this leads
> to uncertainty/confusion. I hope future revisions will be much more
> pedantic in this regard.
>
>
> prz> nope, formats and procedures must be strictly defined for a good 
> inter-operability
> spec. As you say, work in progress that will be polished after discussions
> settle. We focused at this point in time on resolving contention on size of
> the hash and measuring efficiency of the scheme under different assumption
> (which we will present later once things are in general public consumption
> form).
>
>
>
> Also, if the goal is to use HSNPs to achieve the same level of reliability
> that is achieved today using CSNPs, more detailed behavioral specification
> is required. Actions regarding sending/acking LSPs related to the
> sending/receiving of CSNPs are fully specified in ISO 10589. HSNPs
> introduce new behaviors - but the end goal is the same - to ensure that
> LSPDB synchronization is maintained. I think a more precise definition of
> how an implementation tracks the state of the portion of the LSPDB
> associated with an HSNP hash mismatch is required to guarantee reliability
> and interoperability. I am not suggesting that the solution you define
> cannot work - just that it needs a more precise behavioral description.
> Hopefully, that is coming in future revisions.
>
>
> prz> yes, also this will be straight forward. Just like a mismatched CSNP
> leads to a LSP flooding, a missed hash on HSNP must lead to HSNP with
> hashes covering the range in more detail or CSNPs (a MUST in case a node
> hash mismatch is hit) or alternately direct LSP flooding (let’s say for
> example a node hash with 3 LSPs is missed, it’s probably more efficient
> to flood those 3 rather than send a CSNP with those 3). All of that will
> work but as you say precise procedures akin to 10589 must be in the final
> version.
>
>
>
> *Section 2:*
>
>
>
> You say:
>
>
>
> *"At the lowest compression level, it is optimal to generate a single CSNP
> packet on a mismatch in a hash. To achieve this, the first-level hashes
> should initially group about 80 LSP fragments together, with exceptions
> handled later. There is no need to maximize this initial packing."*
>
>
>
> and
>
>
>
> *"The packing process always places all fragments belonging to the same
> system and its pseudonodes within a single node Merkle hash. This hash may
> occasionally exceed the recommended size of 80 fragments..."*
>
>
>
> This is confusing.
>
> I think what you mean to say here is that it is not helpful to pack beyond
> the number of hashes which will fit in a single HSNP PDU. (Approximately 80
> for a 1500 byte MTU). But if a given node is originating 200 LSPs, there is
> no way to split the hash calculation for that node into two HSNP TLVs - and
> so it may indeed require more than one CSNP to determine which of the 200
> LSPs is "out of sync" in the event of a hash mismatch.
>
>
> prz> well, it seems clear enough to me since your interpretation is
> exactly the intended meaning 😉 If the language can be improved
> significantly for clarity here, please suggest.
>
>
>
> *Section 3*
>
>
>
> Not sure why you went to a 48 bit Fletcher checksum.
>
> I don't object - but it makes the bar to deployment/interoperability
> slightly higher since implementations cannot simply use the fletcher
> calculation they have been using for decades. Could you provide a clearer
> justification?
>
> I appreciate that you have provided sufficient info for implementations to
> validate that they have implemented the modified fletcher checksum
> correctly.
>
>
> prz> well, the measurement section (on which lots of CPU has been burnt)
> gives a very precise reasoning why 48 is optimal. 64 leads to actually
> *more*​ collisions that matter funny enough and 32 bits seemed
> unacceptable. You’ll find the simulation numbers and reasoning in section
> 9.2 and during IETF I’ll show some cool graphs to clarify further ;-)
>
>
> Implementing 48bits fletcher is utterly trivial, in fact I took an
> existing crate and just added one macro invocation with according
> buffer/intermediate result sizes 😉
>
>
>
> *Section 5.1*
>
>
>
> You have yet to define the new TLV you require in hellos.
>
>
> Prz> yes, easily done once stuff settles.
>
>
>
> *Section 5.2*
>
>
>
> It seems the intent is to interleave CSNPs and HSNPs (though not insisted
> upon). But the actions to take on receiving a hash mismatch are not fully
> specified.
>
> Ultimately, we have to guarantee synchronization of the LSPDB - which
> means setting/clearing of SRM/SSN and related behaviors in response to HSNP
> reception needs to be specified.
>
>
> prz> again, agree, procedures will be cast in stone similar to 10589 once
> discussions around draft settle to the point it makes sense.
>
>
>
> *Section 6*
>
>
>
> Is the header of an HSNP intended to be identical to the header of a CSNP?
>
> I ask because the following fields in the CSNP PDU header are of length
> "ID Length +2":
>
>
>
> *Start LSP ID*
>
> *End LSP ID*
>
>
>
> but since the new TLV you define uses range identifiers which are simply
> System IDs (NOT LSP IDs), it is not possible to send an HSNP which covers
> only some of the LSPs generated by a given node. This suggests that you
> could modify the Start/End LSP ID fields in the HSNP PDU header to match
> what you have in the new TLV.
>
> If you don't do that, then you will need to state that HSNPs which have
> Start/End LSP IDs which are not of the form "A.00-00" and "B.FF-FF"
> respectively are invalid.
>
>
> prz> HSNP is new packet format and ranges are node-id - node-id. I think
> examples and the included text clarifies it pretty well
>
>
> "
>
> The Start and End System IDs use the standard ID length and indicate
>    the range of fragments covered by the HSNP, just like CSNPs do.  The
>    key difference is that all pseudonodes of the systems within this
>    range are implicitly included.  Both the Start and End System IDs are
>    inclusive, meaning fragments from both endpoints are part of the
>    range.
>
> *"*
>
>
>
>
> Figure 2 and Figure 3 seems to hint at this - but it isn't explicit.
>
>
>
> Also, I assume you will be defining Level 1 and Level 2 HSNP PDUs?
>
>
>
> prz> that’s a misunderstanding from your part. 00 was like this, after
> implementation it looks like levels serve no purpose and hence are gone in
> -01. Any hash included in HSNP can cover chosen amount of nodes. Obviously
> on mismatches the rules force the “disaggregation” which as I said may be
> more HSNP hashes covering less nodes each, CSNPs or even direct flooding.
> An implementation is free to choose on any strategy it desires. Think about
> it as a gradient decent which LSPs being a “global optimum” or “lowest
> energy level”, as long the gradient descends we’ll get there but the
> strategy is free to choose for an implementation depending on lots things
> (statistics, efficiency of CSNP construction, hashes present etc).  Best
> specifications must only be sufficient and necessary and not an
> implementation prescriptions. It it sometimes helpful to talk about bits
> like 10589 does but AFAIR it specifically says “it’s not how you MUST
> implement it”.
>
>
> You say:
>
>
>
> *"The Start and End System IDs exclude pseudonode bytes, as those are
> implicitly included within the ranges."*
>
>
>
> I think what you mean to say is:
>
>
>
> *"The Start and End Range IDs exclude pseudonode and LSP number octets, as
> those are implicitly included within the ranges.**”*
>
>
> *prz> looks to me you say what the draft already says just in different
> way. *
>
>
>
>
>
> *Section 8*
>
>
>
> You say:
>
>
>
> *"thus we focus on realistic scenarios in the order of 50,000 nodes and 1
> million fragments."*
>
>
>
> Assuming use of the maximum LSP lifetime (65535 seconds) and a commonly
> used LSP refresh time of 65000 seconds, the expected number of LSPs being
> refreshed at that scale is about 15/second. Any of these LSPs may be
> transiently out of sync not because of a flooding issue but simply because
> LSP flooding for those LSPs is “in progress” at the time the HSNP is
> generated/transmitted/received. There may also be additional LSP updates
> triggered by topology changes which are in the process of being
> synchronized. This leads to a significant probability of
> transient/temporary hash mismatches which actually require no handling –
> but of course it is difficult at best to determine whether a hash mismatch
> is transient or persistent.
>
>
> prz> this is indeed exactly the same as when sending periodic CSNPs so
> nothing new is introduced here. Either flooding works and synchronizes find
> (and then correct hashes/csnps are sent) or it does not and then we need a
> gradient descend to finer and finer resolution of database description
> until LSPs are sent. HSNPs are just “lower resolution description of
> database” than CSNPs are architecturally speaking.
>
>
>
> When a hash mismatch occurs, there are three actions available:
>
>
>
> 1)Generate an additional HSNP covering the original range where the
> mismatch was detected, but this time with greater granularity
>
> 2)Generate CSNP(s) for the LSPs in the range where the mismatch was
> detected
>
> 3)Mark all the LSPs in the original range to be flooded
>
>
>
> It would be good to have an analysis of the impact of such transient
> mismatches on the overall efficiency of the HSNP solution.
>
> Intuitively, the frequency of transient hash mismatches seems likely to
> increase as the size of the LSPDB increases.
>
>
> prz> Pretty much impossible to come to a generally interesting result
> since the topology, flooding reliability, rate of topological changes
> (node and link flaps, implementation internals like hashing) etc will all
> heavily influence it and based on correct assumptions. In that vain even
> CSNPs can be proven to be utterly useless (where such optimistic assumptions
> elegantly break in reality based on long term experience, I’ll show at
> IETF what happened to the open source once we switched off CSNPs 😉
>
>
>
> *Section 9.2*
>
>
>
> You spend several paragraphs discussing the case of:
>
>
>
> *"if a new fragment has the same sequence number and different content but
> an identical 16-bit Fletcher checksum" to an older LSP which exists in
> LSPDB of nodes in the network.*
>
>
>
> We have discussed this at length previously - and we all agree that this
> is an existing vulnerability in the protocol - though the probability of
> its occurrence (as you have calculated) is extremely low and even then,
> confined to time windows shortly after a node has restarted.
>
>
>
> This is a vulnerability associated with LSP generation.
>
> It is not introduced by CSNPs - nor by HSNPs.
>
> It is not detected by CSNPs - nor by HSNPs.
>
> It is not correctable by CSNPs - nor by HSNPs.
>
> And you are not proposing a means of resolving this vulnerability in the
> draft.
>
>
> prz> nope, it was never the intention to attack this and only way to lower
> its probability is really having a much better hash than the 16 bits which
> will break everything under the sun in current ISIS formats 😉
>
>
>
> So I wonder why this discussion is included in the draft?
>
>
> prz> Because it gives a “base” to understand what a likelihood of hash
> collision of HSNPs is compared to such a scenario hitting us otherwise
> people can argue that introducing a probability once in the lifetime of
> the universe hash collision “breaks the protocol irretrievably”.
>
>
>
> ***
>
>
>
> Finally, I mention a suggestion that I may have made previously.
>
>
>
> Rather than define a new PDU, you could simply introduce a new TLV into
> existing CSNPs. This might have advantages when you detect an HSNP hash
> mismatch and are taking steps to isolate the impacted LSPs. Rather than
> sending HSNPs and CSNPs you could send CSNPs with a mixture of TLVs - which
> might reduce the total number of PDUs sent in order to resolve the hash
> mismatches.
>
>
>
> Thanx very much for your consideration of these comments.
>
>
> prz> rather not, semantically HSNPs are NOT CSNPs and shoe horning them
> into some weird TLVs within CSNPs that need repacking, sliding, may
> collision with contained CSNP entries or themselves over ranges or a
> million other “confusions”  is just generating a non-orthogonal encoding
> w/o any benefit I can discern.
>
>
> Thanks
>
>
> — Tony
>
>
>
>
>
>
>
>
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to