Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06

Adrian Farrel Fri, 01 Mar 2019 11:19:02 -0800

Hello Stephane,

Thanks for this review. It is very thorough and has helped improve the document.


We have posted an update to the draft, and there are responses to your review, 
below.

Thanks,
Adrian

> The document is globally well written with good examples that help the 
> understanding.
> However it requires some refinements in the normative language used: some 
> statement
> should be normative but are not using upper case words.

Yes. I found 16 'must' to convert to 'MUST'. Some lowercase remains, but I 
believe they are correct.

> Abstract:
> - Please expand “BGP” on first use

'BGP' is well known according to 
https://www.rfc-editor.org/materials/abbrev.expansion.txt

> - Please give a reference associated to “Network Service Header”

OK. Obviously we cannot give a citation in the Abstract, but we can use plain 
text.

> Introduction:
> - When “classifier” is used as an example of SF, do you refer to the 
> classifier used to steer
>    the traffic onto an SFC ? If yes, I don’t think it could be considered as 
> an SF.

Ah, no. Hmm. Tricky.
A SF could be responsible for classifying traffic into different buckets for 
different uses. This is a very similar function to an 'SFC Classifier' 
(responsible for classifying traffic onto different SFCs).
I can't think of a way to remove this confusion except by deleting this 
example. Instead, I have boosted the list of examples by borrowing from RFC 
7498.

> - I think the “conventional” approach described in the intro is really old 
> school and
>  even before SFC, we had some means, like dynamic routing protocols and other
>  mechanisms like separate routing tables, to steer the traffic through a set 
> of services.
>  I agree that there is also some drawback associated with such approaches 
> like the
>  operational complexity of provisioning.

OK. I *think* the change here is s/Conventionally/Historically/  and change the 
tense to past.

> Section 1.2:
> It would be good to give a definition for the additional terms that are
> defined in this document.

OK

> Section 2.1:
> 
> As the section is focused on dataplane, it would be good to rename it 
> as “Reminder on NSH dataplane” or something like that. It does not
> provide a functional overview of the controlplane which is what was
> expected based on the title.

Yeah, the CP is in 2.2.
We go for "Overview of Service Function Chaining"

> “A special Service Function, called a Classifier, is located at each
>   ingress point to a service function overlay network.  It assigns the
>   packets of a given packet flow to a specific Service Function Path.“
>
> Again, based on my understanding, the Classifier cannot really be
> considered as an SF , at least this is a component out of the SFC 
> which steers the traffic onto an SFC.

Well, rather than have a philosophical debate, I have changed 
OLD
A special Service Function, called a Classifier,
NEW
A special functional element, called a Classifier,
END

> “An unknown or invalid SPI SHALL be treated as an error and the SFF
>   MUST drop the packet.  Such errors SHOULD be logged, and such logs
>   MUST be subject to rate limits.”
>
> I found strange to have such statement in this document which is focused
> on the controlplane while this statement is a dataplane statement. Isn’t it
> part of RFC8300 Section 3 : 
> “3.  Update the NSH: SFs MUST decrement the service index by one.  If
>       an SFF receives a packet with an SPI and SI that do not
>       correspond to a valid next hop in a valid SFP, that packet MUST
>       be dropped by the SFF.”
>
> The next paragraph deals also with normative dataplane statement which 
> is IMO out of scope of this document.

You're right. There is no place for normative text in section 2.1.
Dropped to lower case, and added "as required by RFC 8300".

> Section 2.2
>
> “The SFIR describes a particular instance of a
>    particular Service Function”. 
>
> Would it be good to talk about “Service Function Instance” rather than
> instance of a Service Function ? That’s understandable, of course, 
> however I think it’s good to reuse the exact terminology. 

It just reads better 😊

Moving to...

“The SFIR describes a particular instance of a particular Service Function 
(i.e., an SFI)..."

> • I think the SFT definition is appearing a bit late in this section and in 
> the document 
>  as it has been referenced already multiple times before.

This is now in 1.2.
The only previous mention of SFT is in the bullet list for the SFPR definition 
two paragraphs higher.
But I have jiggled 2.2 to bring SFT to the top.

> “Service Function Type (SFT) that
   is the category of SF that is supported by an SFF”. 
>
> Don’t you mean SFI rather than SFF ?

Nope.
SFFs support attached SFIs.
An SFI is an instance of an SF.
An SF has a type that is its SFT.

> “Thus the SFF can be seen as a portal…”. Would “gateway” 
> be more suitable rather than “portal” ?

Oh, have I been writing too many fairy stories?

It's pretty much synonymous, with a nice ring to it, but no pain involved in 
changing to 'gateway'.

> The Figure 1 is not really used in this section as part of the existing text. 
> I 
> would be better to have a companion text that explains the figure.

Wow! Yes. That's embarrassing.

> I don’t like (personal opinion), the “grouping” of SFIs in the Figure 1 as 
> part of an SFT. 
> What strikes me is that it could be confused with the usual representation of 
> an SF 
> composed of multiple SFIs. Again that’s just a personal feeling.

Interesting. That confusion does sort of exist.

I don't think an SF is composed of multiple SFIs. 
An SF has a type: its SFT
The may be multiple instances of an SF, the SFIs, all of the same type.
And, of course, it is the SFIs that are attached to the SFF.
There may be multiple SFs of the same type. These are not necessarily to be 
indicated as SFIs. For example, two SFs of the same type may be accessed 
through different SFFs.

I think it might help if the text discussed this, but I don't think that it is 
right to make a hierarchy in the figure.

However, since we're here, the figure could be made a bit better along with the 
text to describe it.

> Section 3:
> “they must use BGP Capabilities”: is it a normative MUST ?

Ack

> Section 3.1:
>
> s/a two byte Type field and a six byte/a two bytes Type field and a six bytes/

Got to love English 😊
You use singular in this case.

> “Two SFIs of the same SFT must be associated”. Is it a normative 
> MUST ? Same comment for next sentences in the paragraph (multiple
> occurrences)

Ack

> “The Service Function Type identifies a service function”. I don’t
> think we can really say that, it identifies the type of service the SF
> is providing but not the SF itself.

Yes

> “Each node hosting an SFI
>   must originate an SFIR for each type of SF that it hosts, and it may
>   advertise an SFIR for each instance of each type of SF.” 
>
> Is it really “and” ? I mean can we just summarize by using “Each 
> node hosting an SFI MUST originate an SFIR for each instance of
> each type of SF” ?

No. This makes a scaling optimization. If you don't need the creator of the SFP 
to know that there are 57 instances of an SF, you just advertise once for the 
type of SF and allow the local SFF select which SFI to use. On the other hand, 
if you want to allow the load balancing to be done by a controller, you can 
advertise.

Adding a brief explanation.

> “A BGP Update containing one or more SFIRs will also include”. Is it a
> MUST or SHOULD “also include”?

Yeah, a MUST, I guess.

> How is the nexthop encoded in the NLRI ?

A bit confused about this question. We're in the section that talks about the 
SFIR: the concept of nexthop applies to the SFP and so is found in the SFPR.

Perhaps the presence of the Tunnel Encapsulation attribute is the point of 
confusion. That attribute is there to say what protocol(s) the SFF is expecting 
to see used for packets arriving for the SFIs.

Or maybe you were expecting the NLRI to encode a prefix like it used to in the 
good old days?

> Section 3.1.1
>
> • s/“It can be included”/”It MAY be included”/ ?

Sure.

> That would be great to give more details about the usage of pools

OK.

> The encoding in Figure 4 is not matching the text. Figure 4 has Type=0x80,
> Sub-type=TBD6, text says type=TBD6 and subtype=0x00

Eeeek! And Figure 5.

The text is correct. I have fixed the figures.

> How is the pool ID managed ?

OK. That's a good one.
It's a globally unique value, and so it needs to be centrally managed. As also 
the assignment of SFIs to the pools.
Note has been added.

> Section 3.1.2
>
> Same comment about Figure 5 not matching the text.

Ack

> Section 3.2:
>
> s/a two byte Type field and a six byte/a two bytes Type field and a six bytes/

Same reply

> Same comment about normative language “must” vs “MUST”

Ack

> How is the nexthop encoded in the NLRI ?

Maybe a better place for the question, if we are not still talking about the 
use of NLRI to carry a prefix.
The SFPR carries the SFP which is a full "path" including the next hops.

> Section 3.2.1:
>
> From a readability point of view, it would be good to tell in the second 
> sentence that the
> requested attribute is Optional Transitive (we can see it in the description 
> but it is also
> good to have it in the intro of the attribute).

OK

> “The presence rules and meanings are as follows.”. It would be good to use 
> normative language
> in the following sentences.

I have done s/OPTIONAL/optional/
Feels to me that the rest is statement of fact rather than direction to the 
implementation.

> I don’t see the “error handling” behavior associated with this attribute
> (discard, treat-as-withdraw…)

The errors would be...
- malformed attribute (need to cover this)
- wrong number of Association TLVs (but they are optional and multiple are 
allowed, so no error)
- Association TLV in error (there is a paragraph for this in 3.2.1.1)
- missing Hop TLV (need to cover this)
- Hop TLV in error (need to cover this)

I think the errors are covered by section 6 of RFC 4271, but we need to point 
to it.

> Section 3.2.1.1
> “It may be present” or “It MAY be present” ?

Yes

> Section 4.1
>
> Having an SFF being part of a single overlay network is IMO a very limited
> use case, in such a case, I don’t think that you need an RT at all. The SFF 
> could be part of a L3VPN which will be used as an underlay.
> An SFF being “multitenant” is more valid and defacto requires to maintain
> separate forwarding states (VRF like…). This is mandatory to maintain the
> tenant isolation and RTs are very useful here to know the appropriate
> context to put the routes in.

I wonder whether there is a misunderstanding.

We have...

   Every node in a service function overlay network is configured with
   one or more import RTs.

That means that an SFF, for example, can be in multiple overlays, and each SFP 
is placed within an individual overlay.

I re-read 4.1 and don't see it saying that an SFF is part of a single overlay. 
Can you point me at what I should change?

> Section 7.1
>
> While I understand that the node doing the classification can perform a deep
> packet inspection to get an entropy indicator, any intermediate node cannot
> set it again as the NSH header will be there.

Right. Exactly. Hence...
   However, when an NSH is
   included in a packet, those key fields may be inaccessible.  For
   example, the fields may be too far inside the packet for a forwarding
   engine to quickly find them and extract their values, or the node
   performing the examination may be unaware of the format and meaning
   of the NSH and so unable to parse far enough into the packet.

> Here is an example:
>
> PacketsIn -→ Classifier ->SFF1 (SFI1) -> SFF2(SFI2) ->SFF3(SFI3)->PacketsOut 
> to dest
>
> Classifier pushes NSH header onto the packets as well as the underlay tunnel
> (using an entropy indicator in UDP or ELI/EL). Where the packets comes in at
> SFF1, IMO, it strips the underlay tunnels including the entropy indicator.
> Which means that when the packet with NSH header will go to SFF2, it will
> not have anymore entropy indicator and there is no way to build it again as
> the NSH header prevents a new deep packet inspection. Am I missing
> something ? Maintaining a per packet state is IMO not doable , especially if 
> the
> SFI modifies the packet content.

OK. 
So this is the thing. *If* you want entropy for the underlay network, you have 
to get it from somewhere. And an entropy label is going to be a lot more 
practical that hoping that each hope in the underlay can do some form of hash.
There are some other options:
- use more fine-grained SFPs so that the SPI is effectively a key to 
  the entropy
- put the entropy as metadata in the NSH requiring that the SFF
  be NSH-aware
- encode the necessary entropy information in the tunnel attribute
- don't use entropy

I don't propose any changes to the document for this point, but do feel free to 
negotiate.

> Section 8
>
> It would have been good to have two SFIs with the same SFT
> on the same SFF in the example.

It's a long way through, but 8.9.1 contains an example of that.

> I don’t like the representation of RD using “192.0.2.1,1” as the “,” can
> be confusing with a regular separator. Why not using :”192.0.2.1:1”
> notation which is well known ?

Obviously not so well known that I knew it well 😊
But, it's a good change.

> Section 8.9.1:
>
> How does an SFF know that an attached SFI is stateful ? I don’t
> think it can know that.

Well, how does the SFF know which SFIs are attached and what their types are?
The registration of SFIs to their SFFs is out of scope of this document (I 
think it was raised as a separate function in draft-ietf-sfc-control-plane).

> I don’t think that the fact that SFF2 is used in both direction is safe
> from a load balancing perspective.
> If the hashing algorithm used by SFF2 is sensible to the order of the
> keys (like source vs dest address, or source vs dest port), it may 
> provide a different SFI as a result of the hashing between the
> forward and the reverse flow.

It's unclear what hashing might be used, but it seems to me that the primary 
choice will be based on the SPI.
Of course, if the hash goes further (i.e.., payload) and the SFF is aware of 
forward/reverse traffic it is capable of hashing the right fields.

But does this smells of an implementation detail?

> Section 8.9.2:
>
> How can the controller instruct the classifier how to place
> traffic ?

This is describe in Section 7.5 which describes a new Flowspec Action extended 
community that assigns a specified flow to a specified [SPI, SI, SFT].  This 
allows a controller to program all classifiers consistently.

An alternative would be to use PCEP based on draft-ietf-pce-pcep-flowspec with 
some SFC extensions (there was a draft in PCE for SFC extensions, but it was a 
bit premature), but that is beyond the scope of this document.

> In addition, this could involve multiple classifiers that need
> to be coordinated.

Agreed. This is part of the programming of classifiers mentioned above.

> The ingress forward classifier and the ingress reverse classifier should
> use SFPs within the same assoc.

Yes, that's what they do in the example.
The Assoc-SPI points to the associated SFP.

> Section 9:
>
> Do we have to set limits on receiving nodes in term of number of
> states received from the controller to mitigate some attack ?

I'm not sure, but probably not. But anyway, that would be out of scope.

> The text talks about security of BGP, what kind of mechanism
> should be put in place ?

To be honest, I don't know.
This has similar security behaviour to a L3VPN, so I guess the same rules apply.

> Do we have any interdomain considerations ?

8300 says that the intended scope is for use within a single provider's 
operational domain.

> This is a controller based approach, so the security of the controller
> itself is also important.

It's true that one approach to building a network architecture (probably the 
wise approach?) is controller based. But this document is providing a BGP based 
control plane not a full solution, so we think that discussion is out of scope.

> References:
> Shouldn’t RFC7665 be normative?

Probably. It gives us a downref, but maybe the IESG doesn't mind them these 
days.

> I think that the mpls-sfc and mpls-sfc-encaps should also be
> normative as you are defining a controlplane to use them.

I don't mind doing that.

_______________________________________________
BESS mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/bess

Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06

Reply via email to