Re: [bess] John Scudder's Discuss on draft-ietf-bess-datacenter-gateway-10: (with DISCUSS and COMMENT)

John Scudder Mon, 17 May 2021 13:55:17 -0700

Hi Adrian,

Comments in line below.


> On May 14, 2021, at 1:04 PM, Adrian Farrel <adr...@olddog.co.uk> wrote:
> 
> [External Email. Be cautious of content]
> 
> 
> Hi John,
> 
> Thanks for the careful review.
> 
>> DISCUSS:
>> 
>> I have several points I’d like to discuss, listed below from most
>> general to most specific.
>> 
>> 1. There’s surprisingly little in this document that seems to be SR-specific
>> (and what there is, has some problems, see below). Is there some reason you
>> rule out interconnecting domains using other tunneling technologies? I ask 
>> this
>> question first because if the answer were to be “oh huh, we don’t need to 
>> make
>> this SR-specific after all” some of the other things I’m asking about might 
>> go
>> away.
> 
> I'm sorry this isn't clear, but the use of other tunnelling technologies is 
> very much in scope. As the Introduction says:
> 
>   The
>   various ASes that provide connectivity between the Ingress and Egress
>   Domains could each be constructed differently and use different
>   technologies such as IP, MPLS with global table routing native BGP to
>   the edge, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN.
> 
> SR is used to identify the tunnels and provide end-to-end SR paths because 
> the ingress and egress domains are SR domains, and the objective is to 
> provide an end-to-end SR path.
> 
> So we are not "making this SR aware" so much as enabling "SR-over-foo" using 
> SIDs to identify the path segments that are tunnels.
> 
> I don't know how to make this clearer except maybe using some red paint.

That would be exclusionary to the colo(u)r-blind.

> We would write...
> 
>   The
>   various ASes that provide connectivity between the Ingress and Egress
>   Domains could each be constructed differently and use different
>   technologies such as IP, MPLS with global table routing native BGP to
>   the edge, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN.  That is, the
>   Ingress and Egress SR Domains can be connected by tunnels across a
>   variety of technologies.  This document describes how SR identifiers
>   (SIDs) are use to identify the paths between the Ingress and Egress
>   and the techniques in this document apply to routes of all AFI/SAFIs.

If you want, you could expand the paragraph as you’ve suggested, but I don’t 
think it’s necessary — now that you’ve pointed out the paragraph, it’s clear 
enough. However, I think the document is still misleading and even inconsistent 
about this. Let me quote some other paragraphs to you.

Section 1:

   Segment Routing (SR) [RFC8402] is a protocol mechanism that can be
   used within a DC, and also for steering traffic that flows between
   two DC sites.  

The “steering traffic that flows between two DC sites” can easily be read as 
meaning, steering it *through* the backbone network. I take it your intent is 
to mean, steering it *over* the backbone network. 

                       In order for a source (ingress) DC that uses SR to
   load balance the flows it sends to a destination (egress) DC, it
   needs to know the complete set of entry nodes (i.e., GWs) for that
   egress DC from the backbone network connecting the two DCs.  Note
   that it is assumed that the connected set of DCs and the backbone
   network connecting them are part of the same SR BGP Link State (LS)
   instance ([RFC7752] and [I-D.ietf-idr-bgpls-segment-routing-epe]) so
   that traffic engineering using SR may be used for these flows.

The requirement that the sites *and the backbone network connecting them* must 
all be part of the same BGP-LS instance caused me to raise my eyebrows up into 
my hairline, but there it is in the text. This surprising assumption (most 
service providers do not, to my knowledge, allow their customers to consume 
their LSDB), plus “traffic engineering using SR may be used for these flows”, 
plus the sentence noted above, led me a long way down the garden path of 
thinking you were proposing end-to-end SR forwarding.

And then we have Section 4:

   When a remote GW receives a route to a prefix X it uses the Tunnel
   Egress Endpoint Sub-TLVs in the containing Tunnel Encapsulation
   attribute to identify the GWs through which X can be reached.  It
   uses this information to compute SR Traffic Engineering (SR TE) paths
   *across the backbone network*

(emphasis added). This serves to confirm my misapprehension that this is an 
exclusively SR solution. 

So now on the one hand, I accept that you were completely serious about the 
paragraph you quoted, and that I mentally elided, having been dazzled by the 
parts I just quoted. On the other hand, I wonder what I’m misunderstanding 
about the parts I’ve just quoted, or if I’m not misunderstanding them, how we 
can square this circle.

>> 2. There’s no discussion about what trust model you’re assuming. SR
>> brings with it its own assumed trust model, laid out in RFC 8402 as “SR
>> operates within a trusted domain” (whatever *that* means). On the one
>> hand, given you’re tying yourself to SR you presumably are tied to its trust
>> model. On the other hand, there are some tantalizing tidbits that suggest
>> otherwise. I would be happier if there were some explicit description of
>> the trust model you’re presuming. It’s hard to evaluate some aspects of
>> the document without knowing if you’re assuming the RFC 8402 closed
>> domain model, or something else.
> 
> I believe that the term "SR domain" in 8402 is basically defined as "a set of 
> nodes that support SR".
> The description in (the ever-so-skimpy section 8 of 8402) says:
> 
>   By default, SR operates within a trusted domain.  Traffic MUST be
>   filtered at the domain boundaries.
> 
> What does "by default" mean in that context?

I wish I knew.

> I think there are two things to think about:
> 
> 1. Forwarding plane trust model. Can packets get into the SR system? The 
> answer to that remains, "No, because traffic MUST be filtered at the domain 
> boundaries." This requires that the domain boundary is the interface between 
> an SR-capable node, and a non-SR node. In this document all GWs and ASBRs are 
> part of the SR domain connected by tunnels across the transit ASes (although 
> the nodes in the transit ASes are not part of that domain).

I’m relieved to hear it; this relates to my earlier comments.

> I dare say that draft-farrel-spring-sr-domain-interconnect explains this 
> better through examples, but the chairs of SPRING told us that that draft had 
> no chance of progressing.
> 
> 2. Control plane trust model. What is the trust model in the BGP system? I'm 
> pretty sure that Section 8 of our draft is adequate for this discussion,.

I suspect if we can get me out of the swamp described in my responses to #1 
above, this one will go away.

>> 3. The use of the term “SR domain” in this document appears inconsistent with
>> its definition in RFC 8402. Here’s that definition, from §2:

[blah, blah, blah]

>> More simply put, 8402 says you can’t send an SR packet from outside an SR
>> domain, into that domain. But your document is written in terms of a
>> multiplicity of SR domains, for example this in Section 1:
>> 
>>  Tunnel Encapsulation attribute.  The gateway in the ingress SR domain
>>  can now see all possible paths to X in the egress SR domain
>> 
>> Maybe a quick fix, assuming you really do subscribe to the RFC 8402 trust
>> model, is to invent, define, and use the term “SR subdomain” and deem all the
>> subdomains to comprise one SR domain, in the sense of RFC 8402 §2 — “They may
>> as well be remotely connected to each other (e.g., an enterprise VPN or an
>> overlay)” seems to describe your situation pretty well.
> 
> We completely agree with the 8402 meta-definition of SR domain (and I used it 
> to answer your previous point).
> 
> The confusion appears to arise purely from the terms "ingress SR domain" and 
> "egress SR domain" which is our bad choice of words. "Site" would be a better 
> word and I will scrub the document to use that. "Subdomain" seems to 
> exacerbate the already-dubious word "domain."

Works for me.

>> COMMENT:

[…]

>>  The auto-discovery route that each GW advertises consists of the
>>  following:
>> 
>> The use of the definite article implies that each GW can advertise one, and
>> only one, auto-discovery route. Is this true?
> 
> Indeed, just one (with potentially multiple tunnel encaps). We can make that 
> explicit unless you can find a reason why advertising more than one would be 
> beneficial.

I don’t see a use for it off the top of my head, although I suppose if you 
squinted hard enough at a dual-stack network you might find one. On the other 
hand, I haven’t tried to think through what the implications would be of two or 
more auto-discovery routes existing at the same time, e.g. what if they 
advertise different tunnel encaps?

>> 4. Section 5
>> 
>>  When a packet destined for prefix X is sent on an SR TE path to a GW
>>  for the SR domain containing X (that is, the packet is sent in the
>>  Ingress Domain on an SR TE path that describes the path including
>>  within the Egress Domain), it needs to carry the receiving GW's label
>> 
>> I can’t understand the parenthetical, in particular “the path including 
>> within
>> the Egress Domain”.
> 
> "describes the whole path including those parts that are within the Egress 
> Site"

Great, I assume you’ll make that substitution.

>> Also, do you really mean “label”, or do you mean “SID”? I don’t think you
>> scoped this to only SR-MPLS, did you? Although reading on within §5 you talk
>> about the “label stack”, so it does appear you’re MPLS specific — probably 
>> this
>> should be said up front, in that case? The title should really be “… for
>> SR-MPLS Enabled Domain Interconnection”?
> 
> Ouch! Yes, that slipped through. It's SIDs all the way down.

OK. I guess some work is needed on "places each in an MPLS label stack sub-TLV” 
as well, then.

[…]

>> 6. Section 8
>> 
>>  All of the issues in the list above could cause disruption to domain
>>  interconnection, but are not new protocol vulnerabilities so much as
>>  new exposures of information that SHOULD be protected against using
>>  existing protocol mechanisms.  Furthermore, it is a general
>> 
>> What are the existing BGP protocol mechanisms that could be used to protect
>> against exposure of information? BGP itself doesn’t have any confidentiality
>> features nor do most of its common transports. Maybe you mean something
>> different, but if so that’s not clear to me.
> 
> I don't think we intended (or said) "BGP protocol mechanisms.”

Considering that all (or almost all) the exposures you detailed relate to BGP, 
I would think it implicit, although when I wrote “BGP protocol mechanisms” I 
was assuming the entire BGP ecosystem, i.e. BGP and its transport.

> But other protocol mechanisms *do* exist to protect BGP exchanges between 
> peers. TLS or TCP-AO spring to mind.

TCP-AO doesn’t do anything towards confidentiality. TLS does but it’s not 
currently used as a BGP transport, to my knowledge. Of course one can also run 
BGP over IPSec if one wants confidentiality, and use of other transports could 
be proposed. But other than running it over IPSec, I’m not aware of “existing 
protocol mechanisms” to avoid “exposures of information” relating to BGP.

You might well say “ok fine, use IPSec then”. But that’s not a complete answer 
either. For example, there are at least two different parties operating 
different parts of the network — the site owner, and the backbone owner. It’s 
likely challenging for one to mandate the other’s security practices. 

> The point here, is that it is not the job of this document to solve BGP 
> security issues.

I agree. I just think you’re overpromising by saying "SHOULD be protected 
against using existing protocol mechanisms”. If you removed those words, I 
don’t think I’d have a problem. If you want to offer some hope and not leave 
the reader depressed you could even finesse it by saying something other than 
“existing protocol mechanisms”, e.g. if you mean the operator(s) should use a 
transport that provides confidentiality, say so. If you think it’s a sad mad 
bad world we live in and that BGP’s security model needs some work, say so, or 
remain silent, but don’t imply that if the operator just turned on the right 
knob they’d be fine.

>>  system.  It should be noted that BGP peerings are not discovered, but
>>  always arise from explicit configuration.
>> 
>> This is true at present, but IDR has work in progress on autodiscovery. (Same
>> comment applies with respect to Section 9.)
> 
> Weeeeeeell, I think that that IDR work will need to provide adequate security.

Fair enough, I’ll lob that over the fence to them.

>> 7. Section 9.1
>> 
>>  consideration.  When using the mechanisms defined in this document,
>>  the operator should consider carefully the effects of filtering
>>  routes.  In some cases this may be desirable, and in others it could
>>  limit the effectiveness of the procedures.
>> 
>> I believe the only use of route targets in this document is for the
>> autodiscovery routes.  If RTC were in use, through its normal operation the
>> gateways would exchange autodiscovery routes exactly as this specification
>> needs them to. So your cryptic warning above leaves me wondering, what are 
>> the
>> cases in which RTC impedes the function of the specification?
> 
> That is really what we intended you to wonder about. I suppose we are saying, 
> "We are worried it might have an effect, but we can't put our finger on it."

OK I guess.

>> 8. General
>> 
>> The autodiscovery mechanism is clear as far as it goes, but I think not all
>> failure modes are addressed. In particular, if there’s partial connectivity
>> within a domain, I think long-term black holing can ensue. Consider this 
>> case:
>> GW1 and GW2 are gateways in domain A. GW3 is a gateway in domain B. GW1 and 
>> GW2
>> discover one another and advertise one another’s encapsulation information
>> accordingly, when advertising a route to prefix X. However, there’s a problem
>> within GW1 and GW2’s domain, such that GW1 can reach X, but GW2 can’t. Even
>> though GW2 may know it can’t reach X, and indeed GW2 isn’t advertising X, GW1
>> is still advertising GW2 as a viable gateway to reach X, and GW3 may well 
>> route
>> traffic for X via GW2.
>> 
>> Admittedly, having partial connectivity within a domain as I’ve described is 
>> a
>> broken situation to begin with, but stuff happens, and your spec would make
>> matters worse. It might be worth acknowledging this issue somewhere in the
>> document?
> 
> Agree with you that "stuff happens." I think that what you have described is 
> a window not a permanent situation.
> When GW2 knows it can't reach X any more, it will stop advertising X, and GW1 
> will receive that and will update what it advertises on behalf of GW2.
> Further, if GW1 can no longer receive advertisements from GW2 then it will 
> stop advertising on behalf of GW2.

Replied separately.

Thanks,

—John
_______________________________________________
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] John Scudder's Discuss on draft-ietf-bess-datacenter-gateway-10: (with DISCUSS and COMMENT)

Reply via email to