I probably misunderstood your "level" comment. If your observation pertained to the ISIS Level, then yes, there will be a HSNP for L1 and one for L2 just like CSNP
--- tony On Wed, Mar 11, 2026 at 8:02 AM Przygienda, Tony <antoni.przygienda= [email protected]> wrote: > Hey Les, thanks for detailed read, lots of valid, productive comments, > inline on prz> > > Sent from Outlook for Mac > *From: *Les Ginsberg (ginsberg) <[email protected]> > *Date: *Wednesday, 11 March 2026 at 07:28 > *To: *[email protected] < > [email protected]> > *Cc: *lsr <[email protected]> > *Subject: *Comments on draft-prz-lsr-hierarchical-snps-01 > > > > But I preface my remarks by saying that the document needs to be more > precise as regards the specification of the new PDUs and new TLVs you wish > to define. In its current form, these elements are only "described" - but > not "specified". In some cases, which I will comment on below, this leads > to uncertainty/confusion. I hope future revisions will be much more > pedantic in this regard. > > > prz> nope, formats and procedures must be strictly defined for a good > inter-operability > spec. As you say, work in progress that will be polished after discussions > settle. We focused at this point in time on resolving contention on size of > the hash and measuring efficiency of the scheme under different assumption > (which we will present later once things are in general public consumption > form). > > > > Also, if the goal is to use HSNPs to achieve the same level of reliability > that is achieved today using CSNPs, more detailed behavioral specification > is required. Actions regarding sending/acking LSPs related to the > sending/receiving of CSNPs are fully specified in ISO 10589. HSNPs > introduce new behaviors - but the end goal is the same - to ensure that > LSPDB synchronization is maintained. I think a more precise definition of > how an implementation tracks the state of the portion of the LSPDB > associated with an HSNP hash mismatch is required to guarantee reliability > and interoperability. I am not suggesting that the solution you define > cannot work - just that it needs a more precise behavioral description. > Hopefully, that is coming in future revisions. > > > prz> yes, also this will be straight forward. Just like a mismatched CSNP > leads to a LSP flooding, a missed hash on HSNP must lead to HSNP with > hashes covering the range in more detail or CSNPs (a MUST in case a node > hash mismatch is hit) or alternately direct LSP flooding (let’s say for > example a node hash with 3 LSPs is missed, it’s probably more efficient > to flood those 3 rather than send a CSNP with those 3). All of that will > work but as you say precise procedures akin to 10589 must be in the final > version. > > > > *Section 2:* > > > > You say: > > > > *"At the lowest compression level, it is optimal to generate a single CSNP > packet on a mismatch in a hash. To achieve this, the first-level hashes > should initially group about 80 LSP fragments together, with exceptions > handled later. There is no need to maximize this initial packing."* > > > > and > > > > *"The packing process always places all fragments belonging to the same > system and its pseudonodes within a single node Merkle hash. This hash may > occasionally exceed the recommended size of 80 fragments..."* > > > > This is confusing. > > I think what you mean to say here is that it is not helpful to pack beyond > the number of hashes which will fit in a single HSNP PDU. (Approximately 80 > for a 1500 byte MTU). But if a given node is originating 200 LSPs, there is > no way to split the hash calculation for that node into two HSNP TLVs - and > so it may indeed require more than one CSNP to determine which of the 200 > LSPs is "out of sync" in the event of a hash mismatch. > > > prz> well, it seems clear enough to me since your interpretation is > exactly the intended meaning 😉 If the language can be improved > significantly for clarity here, please suggest. > > > > *Section 3* > > > > Not sure why you went to a 48 bit Fletcher checksum. > > I don't object - but it makes the bar to deployment/interoperability > slightly higher since implementations cannot simply use the fletcher > calculation they have been using for decades. Could you provide a clearer > justification? > > I appreciate that you have provided sufficient info for implementations to > validate that they have implemented the modified fletcher checksum > correctly. > > > prz> well, the measurement section (on which lots of CPU has been burnt) > gives a very precise reasoning why 48 is optimal. 64 leads to actually > *more* collisions that matter funny enough and 32 bits seemed > unacceptable. You’ll find the simulation numbers and reasoning in section > 9.2 and during IETF I’ll show some cool graphs to clarify further ;-) > > > Implementing 48bits fletcher is utterly trivial, in fact I took an > existing crate and just added one macro invocation with according > buffer/intermediate result sizes 😉 > > > > *Section 5.1* > > > > You have yet to define the new TLV you require in hellos. > > > Prz> yes, easily done once stuff settles. > > > > *Section 5.2* > > > > It seems the intent is to interleave CSNPs and HSNPs (though not insisted > upon). But the actions to take on receiving a hash mismatch are not fully > specified. > > Ultimately, we have to guarantee synchronization of the LSPDB - which > means setting/clearing of SRM/SSN and related behaviors in response to HSNP > reception needs to be specified. > > > prz> again, agree, procedures will be cast in stone similar to 10589 once > discussions around draft settle to the point it makes sense. > > > > *Section 6* > > > > Is the header of an HSNP intended to be identical to the header of a CSNP? > > I ask because the following fields in the CSNP PDU header are of length > "ID Length +2": > > > > *Start LSP ID* > > *End LSP ID* > > > > but since the new TLV you define uses range identifiers which are simply > System IDs (NOT LSP IDs), it is not possible to send an HSNP which covers > only some of the LSPs generated by a given node. This suggests that you > could modify the Start/End LSP ID fields in the HSNP PDU header to match > what you have in the new TLV. > > If you don't do that, then you will need to state that HSNPs which have > Start/End LSP IDs which are not of the form "A.00-00" and "B.FF-FF" > respectively are invalid. > > > prz> HSNP is new packet format and ranges are node-id - node-id. I think > examples and the included text clarifies it pretty well > > > " > > The Start and End System IDs use the standard ID length and indicate > the range of fragments covered by the HSNP, just like CSNPs do. The > key difference is that all pseudonodes of the systems within this > range are implicitly included. Both the Start and End System IDs are > inclusive, meaning fragments from both endpoints are part of the > range. > > *"* > > > > > Figure 2 and Figure 3 seems to hint at this - but it isn't explicit. > > > > Also, I assume you will be defining Level 1 and Level 2 HSNP PDUs? > > > > prz> that’s a misunderstanding from your part. 00 was like this, after > implementation it looks like levels serve no purpose and hence are gone in > -01. Any hash included in HSNP can cover chosen amount of nodes. Obviously > on mismatches the rules force the “disaggregation” which as I said may be > more HSNP hashes covering less nodes each, CSNPs or even direct flooding. > An implementation is free to choose on any strategy it desires. Think about > it as a gradient decent which LSPs being a “global optimum” or “lowest > energy level”, as long the gradient descends we’ll get there but the > strategy is free to choose for an implementation depending on lots things > (statistics, efficiency of CSNP construction, hashes present etc). Best > specifications must only be sufficient and necessary and not an > implementation prescriptions. It it sometimes helpful to talk about bits > like 10589 does but AFAIR it specifically says “it’s not how you MUST > implement it”. > > > You say: > > > > *"The Start and End System IDs exclude pseudonode bytes, as those are > implicitly included within the ranges."* > > > > I think what you mean to say is: > > > > *"The Start and End Range IDs exclude pseudonode and LSP number octets, as > those are implicitly included within the ranges.**”* > > > *prz> looks to me you say what the draft already says just in different > way. * > > > > > > *Section 8* > > > > You say: > > > > *"thus we focus on realistic scenarios in the order of 50,000 nodes and 1 > million fragments."* > > > > Assuming use of the maximum LSP lifetime (65535 seconds) and a commonly > used LSP refresh time of 65000 seconds, the expected number of LSPs being > refreshed at that scale is about 15/second. Any of these LSPs may be > transiently out of sync not because of a flooding issue but simply because > LSP flooding for those LSPs is “in progress” at the time the HSNP is > generated/transmitted/received. There may also be additional LSP updates > triggered by topology changes which are in the process of being > synchronized. This leads to a significant probability of > transient/temporary hash mismatches which actually require no handling – > but of course it is difficult at best to determine whether a hash mismatch > is transient or persistent. > > > prz> this is indeed exactly the same as when sending periodic CSNPs so > nothing new is introduced here. Either flooding works and synchronizes find > (and then correct hashes/csnps are sent) or it does not and then we need a > gradient descend to finer and finer resolution of database description > until LSPs are sent. HSNPs are just “lower resolution description of > database” than CSNPs are architecturally speaking. > > > > When a hash mismatch occurs, there are three actions available: > > > > 1)Generate an additional HSNP covering the original range where the > mismatch was detected, but this time with greater granularity > > 2)Generate CSNP(s) for the LSPs in the range where the mismatch was > detected > > 3)Mark all the LSPs in the original range to be flooded > > > > It would be good to have an analysis of the impact of such transient > mismatches on the overall efficiency of the HSNP solution. > > Intuitively, the frequency of transient hash mismatches seems likely to > increase as the size of the LSPDB increases. > > > prz> Pretty much impossible to come to a generally interesting result > since the topology, flooding reliability, rate of topological changes > (node and link flaps, implementation internals like hashing) etc will all > heavily influence it and based on correct assumptions. In that vain even > CSNPs can be proven to be utterly useless (where such optimistic assumptions > elegantly break in reality based on long term experience, I’ll show at > IETF what happened to the open source once we switched off CSNPs 😉 > > > > *Section 9.2* > > > > You spend several paragraphs discussing the case of: > > > > *"if a new fragment has the same sequence number and different content but > an identical 16-bit Fletcher checksum" to an older LSP which exists in > LSPDB of nodes in the network.* > > > > We have discussed this at length previously - and we all agree that this > is an existing vulnerability in the protocol - though the probability of > its occurrence (as you have calculated) is extremely low and even then, > confined to time windows shortly after a node has restarted. > > > > This is a vulnerability associated with LSP generation. > > It is not introduced by CSNPs - nor by HSNPs. > > It is not detected by CSNPs - nor by HSNPs. > > It is not correctable by CSNPs - nor by HSNPs. > > And you are not proposing a means of resolving this vulnerability in the > draft. > > > prz> nope, it was never the intention to attack this and only way to lower > its probability is really having a much better hash than the 16 bits which > will break everything under the sun in current ISIS formats 😉 > > > > So I wonder why this discussion is included in the draft? > > > prz> Because it gives a “base” to understand what a likelihood of hash > collision of HSNPs is compared to such a scenario hitting us otherwise > people can argue that introducing a probability once in the lifetime of > the universe hash collision “breaks the protocol irretrievably”. > > > > *** > > > > Finally, I mention a suggestion that I may have made previously. > > > > Rather than define a new PDU, you could simply introduce a new TLV into > existing CSNPs. This might have advantages when you detect an HSNP hash > mismatch and are taking steps to isolate the impacted LSPs. Rather than > sending HSNPs and CSNPs you could send CSNPs with a mixture of TLVs - which > might reduce the total number of PDUs sent in order to resolve the hash > mismatches. > > > > Thanx very much for your consideration of these comments. > > > prz> rather not, semantically HSNPs are NOT CSNPs and shoe horning them > into some weird TLVs within CSNPs that need repacking, sliding, may > collision with contained CSNP entries or themselves over ranges or a > million other “confusions” is just generating a non-orthogonal encoding > w/o any benefit I can discern. > > > Thanks > > > — Tony > > > > > > > > > _______________________________________________ > Lsr mailing list -- [email protected] > To unsubscribe send an email to [email protected] >
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
