Hey Les, thanks for detailed read, lots of valid, productive comments, inline on prz>
Sent from Outlook for Mac From: Les Ginsberg (ginsberg) <[email protected]> Date: Wednesday, 11 March 2026 at 07:28 To: [email protected] <[email protected]> Cc: lsr <[email protected]> Subject: Comments on draft-prz-lsr-hierarchical-snps-01 But I preface my remarks by saying that the document needs to be more precise as regards the specification of the new PDUs and new TLVs you wish to define. In its current form, these elements are only "described" - but not "specified". In some cases, which I will comment on below, this leads to uncertainty/confusion. I hope future revisions will be much more pedantic in this regard. prz> nope, formats and procedures must be strictly defined for a good inter-operability spec. As you say, work in progress that will be polished after discussions settle. We focused at this point in time on resolving contention on size of the hash and measuring efficiency of the scheme under different assumption (which we will present later once things are in general public consumption form). Also, if the goal is to use HSNPs to achieve the same level of reliability that is achieved today using CSNPs, more detailed behavioral specification is required. Actions regarding sending/acking LSPs related to the sending/receiving of CSNPs are fully specified in ISO 10589. HSNPs introduce new behaviors - but the end goal is the same - to ensure that LSPDB synchronization is maintained. I think a more precise definition of how an implementation tracks the state of the portion of the LSPDB associated with an HSNP hash mismatch is required to guarantee reliability and interoperability. I am not suggesting that the solution you define cannot work - just that it needs a more precise behavioral description. Hopefully, that is coming in future revisions. prz> yes, also this will be straight forward. Just like a mismatched CSNP leads to a LSP flooding, a missed hash on HSNP must lead to HSNP with hashes covering the range in more detail or CSNPs (a MUST in case a node hash mismatch is hit) or alternately direct LSP flooding (let’s say for example a node hash with 3 LSPs is missed, it’s probably more efficient to flood those 3 rather than send a CSNP with those 3). All of that will work but as you say precise procedures akin to 10589 must be in the final version. Section 2: You say: "At the lowest compression level, it is optimal to generate a single CSNP packet on a mismatch in a hash. To achieve this, the first-level hashes should initially group about 80 LSP fragments together, with exceptions handled later. There is no need to maximize this initial packing." and "The packing process always places all fragments belonging to the same system and its pseudonodes within a single node Merkle hash. This hash may occasionally exceed the recommended size of 80 fragments..." This is confusing. I think what you mean to say here is that it is not helpful to pack beyond the number of hashes which will fit in a single HSNP PDU. (Approximately 80 for a 1500 byte MTU). But if a given node is originating 200 LSPs, there is no way to split the hash calculation for that node into two HSNP TLVs - and so it may indeed require more than one CSNP to determine which of the 200 LSPs is "out of sync" in the event of a hash mismatch. prz> well, it seems clear enough to me since your interpretation is exactly the intended meaning 😉 If the language can be improved significantly for clarity here, please suggest. Section 3 Not sure why you went to a 48 bit Fletcher checksum. I don't object - but it makes the bar to deployment/interoperability slightly higher since implementations cannot simply use the fletcher calculation they have been using for decades. Could you provide a clearer justification? I appreciate that you have provided sufficient info for implementations to validate that they have implemented the modified fletcher checksum correctly. prz> well, the measurement section (on which lots of CPU has been burnt) gives a very precise reasoning why 48 is optimal. 64 leads to actually more collisions that matter funny enough and 32 bits seemed unacceptable. You’ll find the simulation numbers and reasoning in section 9.2 and during IETF I’ll show some cool graphs to clarify further ;-) Implementing 48bits fletcher is utterly trivial, in fact I took an existing crate and just added one macro invocation with according buffer/intermediate result sizes 😉 Section 5.1 You have yet to define the new TLV you require in hellos. Prz> yes, easily done once stuff settles. Section 5.2 It seems the intent is to interleave CSNPs and HSNPs (though not insisted upon). But the actions to take on receiving a hash mismatch are not fully specified. Ultimately, we have to guarantee synchronization of the LSPDB - which means setting/clearing of SRM/SSN and related behaviors in response to HSNP reception needs to be specified. prz> again, agree, procedures will be cast in stone similar to 10589 once discussions around draft settle to the point it makes sense. Section 6 Is the header of an HSNP intended to be identical to the header of a CSNP? I ask because the following fields in the CSNP PDU header are of length "ID Length +2": Start LSP ID End LSP ID but since the new TLV you define uses range identifiers which are simply System IDs (NOT LSP IDs), it is not possible to send an HSNP which covers only some of the LSPs generated by a given node. This suggests that you could modify the Start/End LSP ID fields in the HSNP PDU header to match what you have in the new TLV. If you don't do that, then you will need to state that HSNPs which have Start/End LSP IDs which are not of the form "A.00-00" and "B.FF-FF" respectively are invalid. prz> HSNP is new packet format and ranges are node-id - node-id. I think examples and the included text clarifies it pretty well " The Start and End System IDs use the standard ID length and indicate the range of fragments covered by the HSNP, just like CSNPs do. The key difference is that all pseudonodes of the systems within this range are implicitly included. Both the Start and End System IDs are inclusive, meaning fragments from both endpoints are part of the range. " Figure 2 and Figure 3 seems to hint at this - but it isn't explicit. Also, I assume you will be defining Level 1 and Level 2 HSNP PDUs? prz> that’s a misunderstanding from your part. 00 was like this, after implementation it looks like levels serve no purpose and hence are gone in -01. Any hash included in HSNP can cover chosen amount of nodes. Obviously on mismatches the rules force the “disaggregation” which as I said may be more HSNP hashes covering less nodes each, CSNPs or even direct flooding. An implementation is free to choose on any strategy it desires. Think about it as a gradient decent which LSPs being a “global optimum” or “lowest energy level”, as long the gradient descends we’ll get there but the strategy is free to choose for an implementation depending on lots things (statistics, efficiency of CSNP construction, hashes present etc). Best specifications must only be sufficient and necessary and not an implementation prescriptions. It it sometimes helpful to talk about bits like 10589 does but AFAIR it specifically says “it’s not how you MUST implement it”. You say: "The Start and End System IDs exclude pseudonode bytes, as those are implicitly included within the ranges." I think what you mean to say is: "The Start and End Range IDs exclude pseudonode and LSP number octets, as those are implicitly included within the ranges.” prz> looks to me you say what the draft already says just in different way. Section 8 You say: "thus we focus on realistic scenarios in the order of 50,000 nodes and 1 million fragments." Assuming use of the maximum LSP lifetime (65535 seconds) and a commonly used LSP refresh time of 65000 seconds, the expected number of LSPs being refreshed at that scale is about 15/second. Any of these LSPs may be transiently out of sync not because of a flooding issue but simply because LSP flooding for those LSPs is “in progress” at the time the HSNP is generated/transmitted/received. There may also be additional LSP updates triggered by topology changes which are in the process of being synchronized. This leads to a significant probability of transient/temporary hash mismatches which actually require no handling – but of course it is difficult at best to determine whether a hash mismatch is transient or persistent. prz> this is indeed exactly the same as when sending periodic CSNPs so nothing new is introduced here. Either flooding works and synchronizes find (and then correct hashes/csnps are sent) or it does not and then we need a gradient descend to finer and finer resolution of database description until LSPs are sent. HSNPs are just “lower resolution description of database” than CSNPs are architecturally speaking. When a hash mismatch occurs, there are three actions available: 1)Generate an additional HSNP covering the original range where the mismatch was detected, but this time with greater granularity 2)Generate CSNP(s) for the LSPs in the range where the mismatch was detected 3)Mark all the LSPs in the original range to be flooded It would be good to have an analysis of the impact of such transient mismatches on the overall efficiency of the HSNP solution. Intuitively, the frequency of transient hash mismatches seems likely to increase as the size of the LSPDB increases. prz> Pretty much impossible to come to a generally interesting result since the topology, flooding reliability, rate of topological changes (node and link flaps, implementation internals like hashing) etc will all heavily influence it and based on correct assumptions. In that vain even CSNPs can be proven to be utterly useless (where such optimistic assumptions elegantly break in reality based on long term experience, I’ll show at IETF what happened to the open source once we switched off CSNPs 😉 Section 9.2 You spend several paragraphs discussing the case of: "if a new fragment has the same sequence number and different content but an identical 16-bit Fletcher checksum" to an older LSP which exists in LSPDB of nodes in the network. We have discussed this at length previously - and we all agree that this is an existing vulnerability in the protocol - though the probability of its occurrence (as you have calculated) is extremely low and even then, confined to time windows shortly after a node has restarted. This is a vulnerability associated with LSP generation. It is not introduced by CSNPs - nor by HSNPs. It is not detected by CSNPs - nor by HSNPs. It is not correctable by CSNPs - nor by HSNPs. And you are not proposing a means of resolving this vulnerability in the draft. prz> nope, it was never the intention to attack this and only way to lower its probability is really having a much better hash than the 16 bits which will break everything under the sun in current ISIS formats 😉 So I wonder why this discussion is included in the draft? prz> Because it gives a “base” to understand what a likelihood of hash collision of HSNPs is compared to such a scenario hitting us otherwise people can argue that introducing a probability once in the lifetime of the universe hash collision “breaks the protocol irretrievably”. *** Finally, I mention a suggestion that I may have made previously. Rather than define a new PDU, you could simply introduce a new TLV into existing CSNPs. This might have advantages when you detect an HSNP hash mismatch and are taking steps to isolate the impacted LSPs. Rather than sending HSNPs and CSNPs you could send CSNPs with a mixture of TLVs - which might reduce the total number of PDUs sent in order to resolve the hash mismatches. Thanx very much for your consideration of these comments. prz> rather not, semantically HSNPs are NOT CSNPs and shoe horning them into some weird TLVs within CSNPs that need repacking, sliding, may collision with contained CSNP entries or themselves over ranges or a million other “confusions” is just generating a non-orthogonal encoding w/o any benefit I can discern. Thanks — Tony
_______________________________________________ Lsr mailing list -- [email protected] To unsubscribe send an email to [email protected]
