[Lsr] Re: Further Comments on draft-prz-lsr-hierarchical-snps

Tony Przygienda Fri, 17 Oct 2025 23:52:51 -0700

Since I don't see anything substantial new in this email and you did not
respond to Jul 19
thread refuting your largely same arguments (especially your #1 seems
simply incorrect, CSNP will _never_ detect CSUM collision if seqnr is the
same as Tony Li pointed out and hence the whole argument 'it's unreliable
with this but is was reliable before' is a fallacy, especially once you do
some probabilities math). Hence I allow myself to ignore this thread.  As a
mildly acerbic note, _enormous_ amount of data is being sync'ed using
Merkle trees in things like Dynamo DB that is holding up most of amazon.


As to #2 and #3 it is up to the implementation to use such strategies and
they don't need standardization though AFAIS a section of 'further
considerations' could include such advice albeit I doubt its value, e.g.
bringing up an adjacency with a huge database (where this spec is aiming
at) where 99% of the database is already the same (flap) can massively
benefit the scenario.

thanks

-- tony

On Mon, Oct 6, 2025 at 6:15 AM Les Ginsberg (ginsberg) <ginsberg=
[email protected]> wrote:

> At a high level, this is the way I am viewing this draft.
>
>
>
> IS-IS has 100% reliable flooding defined in the base specification.
>
> As scale requirements increase, there has been an interest in optimizing
> flooding. See:
>
>
>
> https://datatracker.ietf.org/doc/rfc9667/
>
> https://datatracker.ietf.org/doc/draft-ietf-lsr-isis-flood-reduction-arch/
>
>
>
> Flooding optimizations introduce the possibility that the reliability of
> flooding may be compromised. Therefore, the use of CSNPs is of interest as
> this can restore the
>
> 100% reliable flooding guarantee while allowing flooding optimizations to
> be deployed. However, at scale, the size of a complete set of CSNPs can
> become large, so there is interest in finding a way to reduce the cost of
> using CSNPs.
>
>
>
> This draft is a proposal which can significantly reduce the number and
> size of PDUs required to convey a summary of the state of the LSPDB (which
> is what CSNPs do today). However, it does not provide a guarantee that all
> LSPDB
>
> inconsistencies will be detected.
>
>
>
> I do not believe we are (or should be) in the business of defining
> solutions which work "most of the time".
>
>
>
> I cannot support this proposal.
>
>
>
> Below are specific issues I see in the proposed solution. But my
> objections are fundamentally about not being able to provide 100%
> reliability. Addressing the issues below will not alter my opinion unless
> it also provides 100% reliability.
>
>
>
>
>
> Issue #1: Unreliability
>
>
>
> The draft proposes to use a simple hash to summarize the state of a range
>
> of LSPs. The possibility of "hash collision" is not insignificant. When it
>
> occurs it will be undetectable - which compromises the reliability of the
>
> Update process.
>
>
>
> It has been mentioned that even the existing PDU checksum mechanism used
> by IS-IS
>
> (fletcher) can produce collisions - which is true. But in such a case, the
> raw
>
> data is still present in the PDU and can be used to detect LSPDB
> inconsistencies
>
> even in the presence of a checksum collision. In the HSNP proposal, because
>
> only a summary of the data is present it is not possible to detect or
> recover from a hash collision.
>
>
>
> Issue #2: Solution becomes less useful in the presence of LSPDB Differences
>
>
>
> The choice of system ID ranges to advertise in the HSNP is optimized for
>
> cases where the neighbors LSPDBs are mostly in sync. In the case of an
>
> established adjacency, this is likely to be true. But in the case of
> adjacency
>
> bringup this is less likely.
>
>
>
> If one neighbor has LSPs from nodes A, B, C and the other neighbor has not
> yet
>
> received any LSPs from B, then the choice of a system ID range greater
> than 1
>
> is likely to trigger a hash mismatch and result in either flooding of
>
> LSPs from all nodes unnecessarily or require reversion to traditional
> CSNPs.
>
>
>
> This makes the solution unusable in the case of adjacency bringup - which
> is a case also worthy of optimization. A good solution to this issue should
> be usable both for adjacency bringup and periodic CSNPs.
>
>
>
> Issue #3: The solution degrades as scale (size of the LSPDB) increases
>
>
>
> When the LSPDBs are mismatched the
>
> likelihood of hash mismatches increases. Even in a stable network, there
> is a
>
> base level of LSP refresh flooding that occurs. Assuming an LSP lifetime of
>
> 65535 seconds and an LSP refresh time of 65000 seconds we can expect
>
> a base level of LSP updates as shown below:
>
>
>
> Size of LSPDB     Average LSP flooding rate
>
> -------------------------------------------
>
> 1000              1 LSP/65 seconds
>
> 10000             1 LSP/6.5 seconds
>
> 20000             1 LSP/3.25 seconds
>
> ...
>
>
>
> This means as scale increases, the likelihood that hash mismatches will
> occur
>
> increases. Even in the absence of any LSP flooding pathology this is likely
>
> to trigger redundant LSP flooding or a reversion to
>
> traditional SNPs.
>
>
>
> To overcome this, one could imagine a strategy that suppresses HSNPs when
>
> SRM bits are currently set on an interface - but as one of the primary use
>
> cases for HSNPs is in the presence of flooding optimizations where flooding
>
> is intentionally suppressed on some interfaces that strategy will not be
>
> applicable in such cases.
>
>
>
>     Les
>
>
> _______________________________________________
> Lsr mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Lsr mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Lsr] Re: Further Comments on draft-prz-lsr-hierarchical-snps

Reply via email to