[Lsr] Re: Further Comments on draft-prz-lsr-hierarchical-snps

Les Ginsberg (ginsberg) Fri, 17 Oct 2025 14:19:29 -0700

Tony –

Regarding the old thread from July 19, here is an excerpt of the exchange 
between Tony Li and myself:


<snip>

[LES:] Let’s use a very simple example.

A and B are neighbors
For LSPs originated by Node C here is the current state of the LSPDB:

A has (C.00-00(Seq 10), C.00-01(Seq 8), C-00.02(Seq 7) Merkle hash: 0xABCD
B has (C.00-00(Seq 10), C.00-01(Seq 9), C-00.02(Seq 6) Merkle hash: 0xABCD
(unlikely that the hashes match -  but possible)

When A and B exchange hash TLVs they will think they have the same set of LSPs 
originated by C even though they don’t.
They would clear any SRM bits currently set to send updated LSPs received from 
C on the interface connecting A-B.
We have just broken the reliability of the update process.

[Tony Li}:By that metric, the update process has always been unreliable.  All 
it takes is two LSPs with different contents and the same checksum.  This 
breaks CSNPs.  As Tony P. has said, we are now very much into the realm of 
stochastic processes.  CSNPs work in practice because the odds of a collision 
are quite small. The HSNP approach carries that forward.

The analogy of the use of fletcher checksum on PDU contents is not a good one. 
The checksum allows a receiver to determine whether any bit errors occurred in 
the transmission. If a bit error occurs and is undetected by the checksum, that 
is bad – but it just means that a few bits in the data are wrong – not that we 
are missing the entire LSP.


<end snip>

Given that we all have “grey hair”, I did not think it necessary to further 
clarify – but perhaps I should have.
ISO 10589 Section 7.3.16 – and especially Section 7.3.16.2 – is relevant here.

If, as is suggested, we have two LSPs with same source ID and same sequence 
number but different checksums the procedures defined in ISO 10589 7.3.16.2 
will result in the LSP in question either getting purged or regenerated with a 
higher sequence number (depending on whether the LSP in question is not 
owned/owned by the system which detects the inconsistency). This results in 
proper synchronization of the LSPDB.

My point is that in an HSNP, since you no longer have the individual LSP 
descriptions but just a summary hash – any collision means you will not detect 
the inconsistency and therefore not take any steps to properly synchronize the 
databases.
I see no reason why there should be any disagreement on this point.

You might find the low probability of this occurring “acceptable”. I do not – 
which is my main point.

   Les



From: Tony Przygienda <[email protected]>
Sent: Monday, October 6, 2025 4:49 AM
To: Les Ginsberg (ginsberg) <[email protected]>
Cc: [email protected]; lsr <[email protected]>
Subject: [Lsr] Re: Further Comments on draft-prz-lsr-hierarchical-snps

Since I don't see anything substantial new in this email and you did not 
respond to Jul 19
thread refuting your largely same arguments (especially your #1 seems simply 
incorrect, CSNP will _never_ detect CSUM collision if seqnr is the same as Tony 
Li pointed out and hence the whole argument 'it's unreliable with this but is 
was reliable before' is a fallacy, especially once you do some probabilities 
math). Hence I allow myself to ignore this thread.  As a mildly acerbic note, 
_enormous_ amount of data is being sync'ed using Merkle trees in things like 
Dynamo DB that is holding up most of amazon.

As to #2 and #3 it is up to the implementation to use such strategies and they 
don't need standardization though AFAIS a section of 'further considerations' 
could include such advice albeit I doubt its value, e.g. bringing up an 
adjacency with a huge database (where this spec is aiming at) where 99% of the 
database is already the same (flap) can massively benefit the scenario.

thanks

-- tony

On Mon, Oct 6, 2025 at 6:15 AM Les Ginsberg (ginsberg) 
<[email protected]<mailto:[email protected]>> wrote:

At a high level, this is the way I am viewing this draft.



IS-IS has 100% reliable flooding defined in the base specification.

As scale requirements increase, there has been an interest in optimizing 
flooding. See:



https://datatracker.ietf.org/doc/rfc9667/

https://datatracker.ietf.org/doc/draft-ietf-lsr-isis-flood-reduction-arch/



Flooding optimizations introduce the possibility that the reliability of 
flooding may be compromised. Therefore, the use of CSNPs is of interest as this 
can restore the

100% reliable flooding guarantee while allowing flooding optimizations to be 
deployed. However, at scale, the size of a complete set of CSNPs can become 
large, so there is interest in finding a way to reduce the cost of using CSNPs.



This draft is a proposal which can significantly reduce the number and size of 
PDUs required to convey a summary of the state of the LSPDB (which is what 
CSNPs do today). However, it does not provide a guarantee that all LSPDB

inconsistencies will be detected.



I do not believe we are (or should be) in the business of defining solutions 
which work "most of the time".



I cannot support this proposal.



Below are specific issues I see in the proposed solution. But my objections are 
fundamentally about not being able to provide 100% reliability. Addressing the 
issues below will not alter my opinion unless it also provides 100% reliability.





Issue #1: Unreliability



The draft proposes to use a simple hash to summarize the state of a range

of LSPs. The possibility of "hash collision" is not insignificant. When it

occurs it will be undetectable - which compromises the reliability of the

Update process.



It has been mentioned that even the existing PDU checksum mechanism used by 
IS-IS

(fletcher) can produce collisions - which is true. But in such a case, the raw

data is still present in the PDU and can be used to detect LSPDB inconsistencies

even in the presence of a checksum collision. In the HSNP proposal, because

only a summary of the data is present it is not possible to detect or recover 
from a hash collision.



Issue #2: Solution becomes less useful in the presence of LSPDB Differences



The choice of system ID ranges to advertise in the HSNP is optimized for

cases where the neighbors LSPDBs are mostly in sync. In the case of an

established adjacency, this is likely to be true. But in the case of adjacency

bringup this is less likely.



If one neighbor has LSPs from nodes A, B, C and the other neighbor has not yet

received any LSPs from B, then the choice of a system ID range greater than 1

is likely to trigger a hash mismatch and result in either flooding of

LSPs from all nodes unnecessarily or require reversion to traditional CSNPs.



This makes the solution unusable in the case of adjacency bringup - which is a 
case also worthy of optimization. A good solution to this issue should be 
usable both for adjacency bringup and periodic CSNPs.



Issue #3: The solution degrades as scale (size of the LSPDB) increases



When the LSPDBs are mismatched the

likelihood of hash mismatches increases. Even in a stable network, there is a

base level of LSP refresh flooding that occurs. Assuming an LSP lifetime of

65535 seconds and an LSP refresh time of 65000 seconds we can expect

a base level of LSP updates as shown below:



Size of LSPDB     Average LSP flooding rate

-------------------------------------------

1000              1 LSP/65 seconds

10000             1 LSP/6.5 seconds

20000             1 LSP/3.25 seconds

...



This means as scale increases, the likelihood that hash mismatches will occur

increases. Even in the absence of any LSP flooding pathology this is likely

to trigger redundant LSP flooding or a reversion to

traditional SNPs.



To overcome this, one could imagine a strategy that suppresses HSNPs when

SRM bits are currently set on an interface - but as one of the primary use

cases for HSNPs is in the presence of flooding optimizations where flooding

is intentionally suppressed on some interfaces that strategy will not be

applicable in such cases.



    Les


_______________________________________________
Lsr mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to [email protected]<mailto:[email protected]>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: Further Comments on draft-prz-lsr-hierarchical-snps

Reply via email to