Hi Jim,
Apologies for the delay in replying to this message. Further discussion in-line
marked [rjs].
On 25 Jun 2012, at 23:47, UTTARO, JAMES wrote:
> [rjs]: Absolutely, this is the current behaviour. The problem with taking a
> whole session down in this case is that you now take a risk of inconsistency
> for all NLRI across that session for the duration that you hold onto the
> learned NLRI. If one avoids being in the situation where the session is down
> (e.g., by applying treat-as-withdraw behaviour in cases where one can
> determine the NLRI) then all other NLRI on the session continue to be updated
> as they need to be. It is only the NLRI that were included in the erroneous
> UPDATE that may be affected for looping/black-holing.
>
>
>>> [Jim U>] The assumption being that the error was caused by an upstream
>>> speaker and is therefore not truly indicative of an issue over the session
>>> where the error manifests itself. This seems to make sense in the IPV4
>>> case. I am still a bit concerned as I do not understand how the following
>>> is addressed.
>
[rjs]: Actually, I think treat-as-withdraw applied more generally than
optional-transitive only does not necessarily imply that the error was not the
direct fault of the upstream speaker. However:
> - There is no way of knowing if the adjacent peer is the speaker that is
> actually responsible for the malformed attr or is coming from an upstream
> speaker. I can think of no way of knowing this.. Can it be inferred from the
> notion that an error is of the syntactic or semantic variety?
[rjs]: This is true, only where we have a mechanism such as the partial bit in
the optional transitive attribute can we infer that the directly attached
neighbour did not look at the session. What the semantic and critical errors
that are called out in the draft relate to is the impact of the error on the
resulting UPDATE message, rather than the direct neighbour being responsible
for it.
> - There seems to be no threshold when the session is actually taking out of
> service. It would seem that some number of these type of errors would
> indicate a major issue is taking place and should be addressed by severing
> the speaker that is advertising paths with the malformed attr into the
> topology. A large number of these error will create a large number of
> withdrawn messages being generated from many peers. What are your thoughts on
> how this should be addressed?
[rjs]: This was something that has been discussed on the list previously. There
are two key questions in this space:
- Do you expect errors that are indicative of a whole box failure, that are not
related to a large change on the device (e.g., code upgrade) that affect all
prefixes in a manner that the UPDATE message is formed well enough to extract
the NLRI?
- Is the state of reaching all prefixes withdrawn (with UPDATEs withdrawing all
NLRI being sent to all neighbours) an acceptable state? I think there is (of
course) a scaling impact of such UPDATES being transmitted and parsed to all
downstream neighbours, but the impact of such an event is really dependent on
assuming that large proportions of the UPDATEs generated become erroneous.
I don't think that I can categorically state that the answer to the former is
no, but I am not aware of the case. In my view, larger scale issues on the BGP
speaker (e.g., things that affect memory integrity etc.) result in failures
that produce output that is not well enough formed to fall within the
"semantic" errors described in the draft. I would (of course) welcome
implementor and tester's feed back on this point.
>> I would expect all solutions implemented in response to these requirements
>> to be optional. If the risk of incorrectness is unacceptable to you/an
>> operator, then you should absolutely not enable any of these mechanisms. In
>> a number of networks that I have operated, designed and architected, I am
>> prepared to accept the risk of incorrectness, as I consider it acceptable
>> when compared to the risk of complete service outages in terms of impact to
>> my customers during such incidents. At the moment, without the work
>> described through the requirements outlined in this draft I do not have the
>> means to make that call...
>> [Jim U>] I do not understand how it is possible to make this configurable on
>> a per session or AS basis..I would think all speakers participating in a
>> routing context would have to adhere to the same rules for a consistent view
>> across domains.. In my reading of the IDR draft it seems that it would be a
>> MUST.. Maybe I should not be considering that IDR draft as the actual
>> realization of the reqs..
>
> [rjs]: The IDR draft is the solution for some of the requirements --
> particularly those described in Section 3 of the GROW draft.
>>> [Jim U>] Got it..
>
> [rjs]: I do not see why this behaviour needs to be consistent across domains?
>>> [Jim U>] Can you explain this
[rjs]: See later point about "good"/"bad" paths.
>
> [rjs]: Essentially, if I receive an invalid UPDATE message, and apply
> treat-as-withdraw, if the advertising speaker did not know that this was
> erroneous then I end up with a different view of what is in the RIB than the
> advertising speaker does. If this was a prefix I had no other route to, then
> I may black-hole, if it was one where it was a more-specific of some larger
> prefix, then we end up with the potential for loops.
>>> [Jim U>] Yes.. I am not sure I like the notion of forwarding loops
>>> especially for large flows..
[rjs]: The potential for loops exists in some specific scenarios I think --
especially those where there is a covering aggregate advertised to a speaker,
and a more specific that advertised within that aggregate. If this is the case,
then in some cases, rather than forwarding back to the device advertising the
more specific (i.e., the one that was withdrawn). I think the below example
shows something like this - if 10.0.0.0/24 is advertised from C to B, then A
forwards packets destined to 10.0.0.0/24, during the time that this prefix is
withdrawn, then this will loop. Now, I think that this is a feature of this
topology anyway, since where B-C is down, then there will be loops for 10/8 in
B-D.
<-- 10.0.0.0/8 --
[ A ] --0.0.0.0/0--> [ B ] ---0.0.0.0/0---> [ D ]
|
10.0.0.0/24
|
[ C ]
[rjs]: There is then a discussion as to whether one would actually expect such
topologies to occur in practical terms. Really, I'd rather expect that there
are blackholes (e.g., I only had one path to A, and it got withdrawn, if anyone
forwards me packets destined for A, then I drop them) or (more likely in an
Internet DFZ perspective) I converge to an alternate path I had to that NLRI.
[rjs]: The reason that this is highlighted in the text is that introducing
behaviour into the protocol such that loops may occur is obviously a compromise
to protocol correctness, that may be a compromise to overall network forwarding
integrity. It's important that this risk is understood, and balanced against
the wider impact of session tear down.
>
> [rjs]: If I am prepared to accept the black-holing or loops for the NLRI in
> the erroneous UPDATE as a risk, in favour of keeping the remaining NLRI
> working (and being updated/withdrawn if they change), then this is a local
> decision and I do not need to imply any behaviour of the neighbouring domains.
>>> [Jim U>] I guess what I meant was the other paths that are considered good
>>> would be treated differently.. So in an environment where only paths with
>>> the mal-formed attr are affected by this error condition as opposed to an
>>> environment where all paths are affected ( withdrawn ) would create a
>>> inconsistent view of the "good" paths across AS domains.. So not so much
>>> the "bad" paths but the "good" paths and how they may be treated
>>> differently..
[rjs]: I'm not sure I fully understand here:
- Today: UPDATE is received from element A and found to be erroneous -
session is reset, downstreams do not see any paths where A was the best-path in
the RIB.
- With this draft: UPDATE is received from element A, found to be
erroneous, downstreams still see all other paths where A is the best-path in
the RIB.
[rjs]: I'm not sure that this is so much inconsistency of what the "good" paths
look like - both the receiving and downstream elements still consider A's paths
as valid, other than the ones that were included in the erroneous UPDATE. In
both cases, the NLRI contained in the erroneous UPDATE is also not propagated
downstream (session reset, or treat-as-withdraw stops the further propagation).
> [rjs] I'd say that it's not just applicable to IPv[46] in the Internet - but
> to numerous AFIs (there is a definite use-case for these solutions in L3VPN
> environments for instance). I am not saying that this is applicable or
> desirable to be turned on for all AFIs -- but it seems to me that this is a
> per-operator, per-deployment decision, not a per-AFI one. For instance, if we
> get an RTC UPDATE that is malformed, an operator may not want to tear down a
> session if it also carries other AFIs (e.g., VPNv[46] also) - in that case,
> the operator may want to treat this UPDATE as withdrawing the {as,
> route-target} NLRI (consider that we have no *standardised* multi-session
> mechanism yet, and there are potential scaling impacts of multiple sessions).
>
>>> [Jim U>] Quite honestly AFs such as RT-C, Flowspec, etc... where the info
>>> being propagated is more akin to "configuration" not path info should
>>> persist regardless of the session.. This goes to the heart of the
>>> discussion of BGP is used for many fields of use that require persistence.
>>> It is not only paths that use BGP for dissemination.. I would prefer that
>>> this solution is limited to AFs that disseminate reachability/path info not
>>> configuration info..
[rjs]: The persistence discussion is a further optimisation over this work I
feel, it addresses (as you correctly said before) more failure cases. In the
case that one UPDATE containing modifications to this configuration information
is invalid, is it worth making the rest of it "stale" (and not able to be
updated)? I think that in the case, you also want to keep as much of the
RIB/config info up-to-date as possible, therefore targeting the error handling
mechanism to the contained NLRI still seems advantageous.
[rjs]: Now, the question may be whether treat-as-withdraw is suitable in these
cases -- is it better to remove the flow specification, or RT from those
installed, or keep it and know that it might be stale? I'd be interested to
hear your thoughts here.
[rjs]: On the point of addressing this per-AF, perhaps the text to add to the
draft is that behaviour such as treat-as-withdraw must (MUST?) be configurable
on a per-AFI basis? The problem with stating something like this, is what does
one do when there is no multi-session, and it is disabled for one AFI, yet
enabled for another?
Thanks,
r.
_______________________________________________
GROW mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/grow