Hi Jeff,

Jeffrey Haas wrote on 2018-11-06 05:20:

I'm ambivalent of the transport, but agree that TCP shouldn't be the default
answer.

I picked TCP because every router has a working TCP implementation.
And TCP is good enough for BGP. And thus also considered good enough
for LSVR. If that's the case, I'd assume it is good enough for IS-IS
as well.

It's easy to adjust our draft so that new transports can be introduced
over time. We can do TCP now. Add Quic later. And add other, new, better
transports later, when they become available.

I don't know much about Quic. But it seems the protocol and details are
not 100% stable yet. Maybe soon. So maybe we can do TCP now, and Quic later ?
Also, Quic might be easy to implement for router OSes that run on top
of Unix. But for OSes that use QNS, vxWorks, or something proprietary,
Quic might be more work. (It's up to others to decide if that's important
or not. I have no opinion on this matter myself).

My concerns that I tried raising via jabber summarize roughly as follows: - TCP is prone to interesting backpressure issues, typically as a result of
  packet loss or slow receivers.

If a receiver is slow, that's the same situation as when IS-IS on the
neighbor is slow. Retransmissions happen. Retransmissions with 10589
flooding are fixed time (5 seconds). (I guess some modern implementations
do something smarter). So convergence would have been impacted with
native flooding too.

Note that our proposal only does TCP over directly connected routers.
I'm sorry to say I have little experience with behaviour of BGP in
real networks. Where did you observe these backpressure issues ? EBGP
or iBGP ? I expect to see more problems with iBGP, because iBGP goes
over multiple hops, which can cause all kinds of issues. EBGP is mostly
over directly connected interfaces. I expect TCP to behave much nicer
There. TCP behaviour of IS-IS flooding would resemble eBGP more than iBGP.
At least, that is my expectation.

Note. If you would do flooding over a tunnel, flooding over TCP might
be beneficial too. Because of tunnel overhead (e.g. GRE-headers) tunnels
usually have a smaller MTU. Therefor all max-sized LSPs will need to be
fragmented when sent over the tunnel. This is also (especially) true for
CSNPs. For packet-based tunneling-protocols, that means 2 packets for
each max-sized LSP or CSNP. When using TCP, the LSPs get spread out over
multiple segments, which should make reassembly a bit easier/cheaper.

- TCP timers can react poorly in some environments where you may want time sensitive things. This includes something as long as 3 second BGP hold
  timers.

When you do flooding-over-tcp, then you don't need to send PSNPs (acks) or do retransmission of LSPs. So you don't need timers for those. Things become
less time-sensitive (at the cost of potentially slower flooding).

- IGPs have a lot of interesting timer hacks to try to ensure that a given domain has a consistent database prior to running an SPF. In the face of "stuck" flooding due to backpressure or other things, some of these may need
  to be revisited.

Again, it is my expectation that in case of problems with TCP, that same situation
would have been worse with native ISIS flooding.

Also note that in BGP, every update packet over every peering has significance. If one gets delayed, it slow down overall convergence. In ISIS flooding, a router will receive multiple copies of the same LSP. So if one TCP-connection is slow, the router might still receive the same LSPs over other paths. And the impact
on overall convergence is likely to be less.

Of course, this implies that routers flood over more than 1 or 2 interfaces. If we do one of the flooding-reduction proposals, I hope we'll end up with a situation where we have 3 or 4 redundant flooding topologies, so that routers will still receive LSPs quickly over other topologies when the primary topology has problems.

It's been over a year since I looked at QUIC.  I agree with Tony that a
number of the properties it had on my last read are desirable. I'd suggest that its behavior (especially timers) in the event of packet loss should be
given a look at based on the comments above.

One other benefit of doing flooding over TCP is that part of the flooding administration is now outsourced to TCP. And TCP usually runs in another thread of process, inside or outside the kernel. This means that we'll automatically get a light form of multi-threading. Less work for the IS-IS process/threads. Quic runs in user-space. I don't know if that means it is a library, and functions run in the user's thread/process. Or whether Quic is a separate process/thread. If Quic runs in IS-IS's thread, it means we lose a cheap form performance
improvements because of multi-threading.

henk.

_______________________________________________
Lsr mailing list
Lsr@ietf.org
https://www.ietf.org/mailman/listinfo/lsr

Reply via email to