Re: [Lsr] IS-IS over TCP

Les Ginsberg (ginsberg) Wed, 07 Nov 2018 08:06:55 -0800

Henk -

Thanx for the thoughtful response.
I'll do my best to respond in kind.
Inline.


> -----Original Message-----
> From: Henk Smit <[email protected]>
> Sent: Wednesday, November 07, 2018 5:26 AM
> To: Les Ginsberg (ginsberg) <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [Lsr] IS-IS over TCP
> 
> 
> Hi Les,
> Comments inline.
> 
> Les Ginsberg (ginsberg) schreef op 2018-11-07 05:09:
> 
> > 1)IS-IS PDUs are sent directly over layer 2 - whereas TCP (or other
> > transport) are sent over Layer3.
> > This means we have a potential fate sharing issue where IIHs (which
> > continue to be sent over Layer 2 in your proposal (as I understand it)
> > may be successfully exchanged but the transport session set up to send
> > LSPs/SNPs may fail. This is the exact issue RFC 6213 has addressed.
> 
> The problem that RFC6213 tries to solve is a case where one of the
> neighbors is thinking that the other does not support BFD. And thus
> the lack of BFD is not used as an indication that something is wrong.
> Right ?
> 
[Les:] This is not correct.
The key paragraph is in https://tools.ietf.org/html/rfc6213#section-2 

" The problem with this solution is that it assumes that the
   transmission and receipt of IS-IS Hellos (IIHs) shares fate with
   forwarded data packets.  This is not a fair assumption to make given
   that the primary use of BFD is to protect IPv4 (and IPv6) forwarding,
   and IS-IS does not utilize IPv4 or IPv6 for sending or receiving its
   hellos."

We have seen cases where IPv4/IPv6 data packet delivery has been compromised - 
but IS-IS PDU delivery was unaffected. This led to the following behavior:

1)IS-IS exchanges hellos and bring adjacency up. Routes using the link are 
installed
2)BFD session is started and comes up
3)After a time some problem occurs which only impact IP traffic - BFD session 
goes down.
4)IS-IS adjacency is brought down due to BFD session down, but IS-IS continues 
to send hellos. If they are successfully exchanged then IS-IS adjacency is 
almost immediately restored and we resume installing IP routes using the link 
even though BFD session never comes up. Data traffic gets dropped.

The extensions in RFC 6213 allow IS-IS to know when both sides support BFD, 
which means the sequence changes to:

1)IS-IS exchanges hellos - but adjacency remains in INIT state.
2)BFD session is initiated - IS-IS adjacency remains in INIT state until BFD 
session comes up.

Thus IS-IS never installs routes using the link unless we know IP traffic can 
be successfully delivered.
We can then use BFD both as a requirement to bring the adjacency up AND as fast 
failure detection.


> When using TCP, both neighbors are aware that they want/need to use TCP.
> Because both advertise the capability in a TVL in the IIH.
> So this problem won't occur. The solution of RFC 6213 is to include
> a new BFD-TLV in the IIHs. Well, our proposal also includes a TLV in
> the IIH. Same solution.
> 
> Another thing that might be noteworthy is that we propose to use
> ISIS-Flooding-over-TCP only on p2p interfaces. Because we believe
> that the majority of large IS-IS networks (in data-centers at least)
> will configure all their ethernet-interfaces to be p2p.
> The problem-description in section 2 in RFC6213 seems to indicate
> a situation with 3 routers (A, B and C) connected via one interface.
> Correct ? 

[Les:] RFC 6213 is applicable to Pt-Pt interfaces - in fact it is easier to 
utilize on Pt-Pt as we do not have the N-squared issue. It is unfortunate that 
the example in the RFC uses a LAN.

> So that's a multi-point interface. If we would want to do
> Flooding-over-TCP on multi-point interfaces, we will need a much
> more complex algorithm (involving the DIS, and potentially a
> backup DIS too). Not worth it, imho. The benefits of this proposal
> is that a) it's simple, and b) it's using existing technology (TCP).
> 
> > What should happen if IIHs are being successfully exchanged, the
> > neighbors both indicate support for the extension, but the transport
> > session fails to come up or goes down?
> > Would you follow similar behavior to RFC 6213 i.e., bring the
> > adjacency down? If so, how would you determine when you can bring the
> > adjacency up?
> 
> That is a good question.
> 
> > I appreciate you have indicated this as TBD in Section 7.6 of the
> > draft, but you also say in Section 7.5 that adjacency should NOT go
> > down when transport session fails - which implies we can have an
> > ongoing condition where an adjacency is up but
> > flooding cannot be done-  which is a pathological condition.
> 
> My gut feeling says we should not tear down the adjacency.
> 
> What if two routers can exchange IIHs and do proper flooding of
> LSPs. But they can not exchange IP packets ? This could happen.
> IS-IS does not have a way to deal with this.

[Les:] RFC 6213 was written precisely to address this case - and works very 
well.

> IS-IS would keep
> advertising the link to the rest of the network. And blackhole
> traffic. But people have never complained about this risk before.

[Les:] Actually they have. :-) That's why we wrote RFC 6213 - because the 
problem has been seen in the field.

> Why ? My guess is that implementations are good enough that we
> know ip-forwarding almost works. It is my opinion that we can
> be just as sure that TCP between two directly connected routers
> will work just as good. If not, it's probably a bug. Just as likely
> that ip-forwarding won't work between the same two routers.
> 
> The reason I propose to not tear down the adjacency is that it
> makes other things much more complex. The state of the TCP connection
> itself would introduce new state that makes it harder to do things
> like non-stop-forwarding, graceful restart, process restarts,
> fail-over to a hot-standby 2nd control-plane, etc.
> 

[Les:] I appreciate your concerns - and it is true that "most of the time" if 
we cannot receive IP traffic we likely cannot receive IS-IS PDUs . But actual 
field experience has shown this is not always the case - and when it is not the 
lack of recognition of the failed state has major impacts.

I don't think you can ignore this case just because it will happen infrequently.

> > 2)Your statements regarding existing flooding limitations of IS-IS are
> > rather dated. Many years ago implementations varied from the base
> > specification by allowing much faster flooding and contiguous flooding
> > bursts on an interface in support of fast convergence. There are
> > existing and successful deployments of an instance with thousands of
> > neighbors and thousands of nodes in a network and sub-second
> > convergence is supported. So the statement that the existing protocol
> > per interface flooding is a blocking factor in highly meshed
> > topologies is not (IMO) accurate. The more accurate characterization
> > of the flooding problem in dense topologies is the amount of redundant
> > flooding (i.e., a node may receive many copies of a new LSP) - which
> > your proposal does nothing to address (I understand it was not
> > intended to address this problem which you discuss in a different
> > draft).
> 
> I worked on IS-IS in the nineties. Yes, my knowledge is outdated.
> However, I've not seen any RFC or draft or documents that indicate
> how to do better flooding. That means implementors have to reinvent
> the wheel. Packet-pacing, high-bandwidth, back-pressure,  etc will
> steer implementations towards proprietary solutions that will resemble
> things that TCP does. If so, why not use TCP ? Or another existing
> transport protocol that does even better ?
> 
> We're trying to solve flooding issues. In very dense networks. And
> in networks with many routers. The requirements document that I read
> earlier this year mentions 10k routers. Tony Li suggested last month
> to me that flooding algorithms should be able to deal with situations
> where single routers have 1k adjacencies. If we want to improve the
> protocol, I think we should also improve the situation where we need
> to flood 10k LSPs over a single adjacency. That's the problem we try
> to solve here.
> 

[Les:] I am not suggesting that this aspect of scaling can be ignored. But I am 
saying implementations have successfully addressed this with "smarter 
implementations" and done so w/o requiring protocol extensions.

> > My point here is that there are existing implementations which would
> > get no benefit from your proposal. It might be argued that someone
> > writing a new implementation may find it simpler to make use of a
> > transport mechanism like TCP - but I do not think there is compelling
> > data that demonstrates that the scalability of an implementation using
> > your proposal is better than that of many existing implementations.
> 
> When I worked at cisco in the nineties, that IS-IS implementation
> could deal with 250 routers in a double full mesh. That's 500
> adjacencies per router. Those routers were running 100MHz and 200MHz
> mips cpus. Some were even cisco 7000s with 68040 cpus. That worked.
> With my outdated knowledge, I can not understand who a datacenter
> fabric where routers have 64 or 128 adjacencies could be a problem.
> Not with a proper implementation. But we're trying to fix that too.
> 
> > This then suggests that for existing implementations the main
> > motivation to support your proposal is to help other implementations
> > which have not optimized their existing implementations. :-)
> > Comments?
> 
> Proper IS-IS implementations should split up in 3 threads at least.
> One thread for maintaining adjacencies, one for doing flooding and
> one for doing SPFs (and route-installs). That way an SPF doesn't
> hold up flooding. And heavy flooding or long SPFs don't break
> adjacencies. How many implementations in the field do that ?

[Les:] You are correct - and the better implementations do exactly what you 
describe. They had to because problems encountered in real deployments required 
it.

> Heck, I've even seen today's implementations that do not split
> off adjacency-maintenance as a separate process or thread.
> 
> Anyway, the question is: if you want to have 10k LSPs in your
> flooding domain, do we depend on custom improvements, or do
> we want something that's documented ?
> 
[Les:] I think the details of the "smart implementations" have not been 
formally documented because they did not require protocol extensions (i.e., 
they are interoperable w implementations which do not have equal smarts) and 
they are largely seen as the value add resulting from significant 
development/test efforts. Why would I want to give that knowledge away for free 
when I might use it to gain a competitive advantage?

Perhaps it is time to revisit that and consider writing some sort of BCP 
document - but that is a separate discussion.

I am simply stating what I know to be true.

> 
> One of the things that inspired me to do this proposal was LSVR.
> LSVR uses BGP-LS to transport LSPs. Why ? The word on the street
> is that "BGP scales so much better". Why does BGP scale so good ?
> Imho there is one main reason: BGP uses TCP for transport.
> Now I don't like LSVR (and I don't like BGP-LS). Because LSVR
> seems to re-invent the wheel, with no real improvements over
> IS-IS, except for the fact that it uses BGP which uses TCP.
> If that's the main benefit of LSVR, then why not just use
> TCP and be done with it ?
> 
> 
> BTW, there is another reason for our proposal. With the incoming
> drafts about flooding-topology-reduction, there is a new problem.
> All these proposals have situations where non-flooding adjacencies
> suddenly change to flooding adjacencies. When that happens, the
> LSDBs need to be synchronized again. To do that, all of them
> propose "just send a CSNP and be done with it". Well, the more
> LSPs, the more CNSPs that need to be sent. With 10k LSPs that's
> 110 CSNPs. CSNPs are not reliable. This re-synchronization happens
> when there is churn in the IGP. Are we sure CSNPs aren't dropped
> somewhere ? Can we start sending LSPs because we know the neighbor
> has sent all its CSNPs yet ? With reliable transport for LSPs and
> SNPs these worst-case scenarios will improve.
> 
[Les:] I do not advocate "just send an CSNP and be done with it" for the very 
reasons you mention.
You will see that 
https://tools.ietf.org/html/draft-li-lsr-dynamic-flooding-01#section-6.7.2 
specifies that normal LSPDB synchronization occurs on adjacency bringup.

> Apologies for the long text.
> I hope it explains our goals and proposal a bit more.

[Les:] No apology necessary - it is a good discussion.
Hope my replies have been helpful.

   Les

> 
> henk.
> 
> 
> 
> >
> >    Les
> >
> >
> >> -----Original Message-----
> >> From: Lsr <[email protected]> On Behalf Of Henk Smit
> >> Sent: Monday, November 05, 2018 8:22 PM
> >> To: [email protected]
> >> Cc: [email protected]
> >> Subject: Re: [Lsr] IS-IS over TCP
> >>
> >>
> >> Thanks, Tony.
> >>
> >> We picked TCP because every router on the planet already has a TCP
> >> stack
> >> in it.
> >> That made it the obvious choice.
> >>
> >> Our draft described a TVL in the IIHs to indicate a router's
> >> ability to use TCP for flooding.
> >> That TLV has several sub-TVLs.
> >> 1) the TCP port-number
> >> 2) an IPv4 address
> >> 3) and/or an IPv6 address
> >>
> >> We can change the first sub-TVL so that it indicates:
> >> 1) 1 or 2 bytes indicating what protocol to use
> >> 2) the remainder of the sub-TLV is an indicator what port-number
> >>     or other identifier to use to connect over that protocol.
> >>
> >> This way we can start improving IS-IS with TCP today.
> >> And add/replace it with other protocols in the future.
> >>
> >> henk.
> >>
> >>
> >>
> >> [email protected] schreef op 2018-11-06 04:51:
> >> > Per the WG meeting, discussing on the list:
> >> >
> >> > This is good work and I support it.
> >> >
> >> > I would remind folks that TCP is NOT the only transport protocol
> >> > available and that perhaps we should be considering QUIC while we’re
> >> > at it.  In particular, flooding is a (relatively) low bandwidth
> >> > operation in the modern network and we could avoid slow-start issues
> >> > by using QUIC.
> >> >
> >> > Tony
> >> >
> >> > _______________________________________________
> >> > Lsr mailing list
> >> > [email protected]
> >> > https://www.ietf.org/mailman/listinfo/lsr
> >>
> >> _______________________________________________
> >> Lsr mailing list
> >> [email protected]
> >> https://www.ietf.org/mailman/listinfo/lsr
_______________________________________________
Lsr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/lsr

Re: [Lsr] IS-IS over TCP

Reply via email to