-----Original Message----- From: Robert Raszuk
[mailto:[email protected]] Sent: Wednesday, September 21, 2011 5:25
PM To: George, Wesley Cc: [email protected] Subject: Re: [GROW] I-D
Action: draft-ietf-grow-diverse-bgp-path-dist-05.txt
Hello Wes,
Other stuff: 2.1 - when discussing overhead and scale concerns for
add paths, perhaps a citation to 4984 would be appropriate?
I would prefer not to mix the growing internet scale concerns from
some of the operational practices/configuration based based scale
concerns.
WEG] Understand, but I'm not sure that it's so easy to separate the
two. You'll find me saying the same thing to anyone suggesting a
change that has the net effect of significantly increasing the burn
rate for memory and CPU resources, whether it's a configuration
change or otherwise, because it still exacerbates the overall issue.
(more on that in a moment)
I've made similar comments to the SIDR folks, and I think generally
anything that adds a non-trivial amount of impact to the growth
curve of the routing system needs to consider this.
I think there is substantial difference for local vs global size
increase of the routing system. Here in this work all concerns are
regarding to the local one.
WEG] Generally, I'm not sure that I'd make so much of a distinction.
While yes, in theory changes of this type only impact the ASN that
chooses to implement it, rather than what it announces to the outside
world, the global scaling problem is due to the intersection between
available resources, their growth curve, and the growth curve of the
routing table. Saying that it only is a concern if it contributes to
the size of the DFZ routing table is oversimplifying the root
problem, because if internal scale problems exhaust the resources
available for both internal and external routes, you still have the
same end state - out of resources. In that case, the only difference
between a local scaling problem and a global problem is the
deployment penetration. If this is widely deployed, it has now
steepened the growth curve noted in 4984, because it still is using
some of the overall available resources. I've said on more than one
occasion that the iBGP routes carried by an SP are as much or more of
a problem than the growth of the global table because they don't have
nearly as much of the aggregation and optimization to reduce their
footprint. The only difference is the level of administrative control
over growth, but that's a fairly limited knob to turn - for lots of
reasons it may not be any more feasible to change things internally
to reduce internal route growth than it is to change global route
growth. Besides, I think that your draft is trying to have it both
ways - you malign Add Paths for having scaling problems, and then
seem content to gloss over a very similar problem created by your
solution simply because it appears to be slightly less severe and
more localized.
4. This asserts that no code changes are necessary to RR clients.
I'm not sure I totally agree with that... If the idea is to have a
primary (best) RR and then N additional paths, the general
assumption is that the N, N1, ... RRs are carrying routes that are
less and less preferred. How does this system avoid the same sort
of inconsistency of best path choice among different routers in the
network if there is no way to identify those paths as secondary? I
think you need some way to determine if the alternate routes are
intended to be ECMP routes or backup routes... You may be able to
cover this without code changes by using alternate configurations
of other BGP preference indicators (MED, Localpref, metric, etc),
perhaps with inbound route policy on the client or outbound on the
RR, but since things like metric may be different based on where
something is in the network, that may lead to inconsistency if used
by itself. Even then, the draft doesn't discuss how this should be
managed.
I stand by the claim that no code change is needed on clients.
Moreover no even additional policy change is required either.
The best way to illustrate this is to compare presence of additional
BGP paths on the clients in the scenario where clients are
interconnected with full IBGP mesh or would get all paths with
add-path. In neither case there is a notion of RR telling client
which path is best or which is second best .. and there is number of
good reasons for that (one is that for RR numbering paths can be
different then for client, the other one is that when we would
withdraw any path advertised and ordered we would need to re
advertise with new order all remaining paths - that amount of churn
is non negligible).
Each client's BGP best path is capable of making safe (loop free)
autonomous choice of paths in PIC/fast connectivity restoration/ibgp
multipath cases.
WEG] I'm sorry, maybe I'm being thick, but I still don't understand
how this would work in a way that would always avoid routing loops.
Under normal state, you have a RR client reflecting its best path to
the client based on the routes it receives from the rest of its
neighbors, meaning that the clients don't have visibility to
candidate alternatives that the RR does, so they're all making the
same choice at least within the local cone of influence of that RR.
You add a second set of RRs (rr') that is announcing a second-best
path as if it was the best path to restore 1 (or more) of the
candidate alternatives to the client. The client receives the best
and 2nd-best path and evaluates them using standard methods. If the
thing that makes one route better than the other is something locally
interesting like metric, and the client's particular place in the
universe means that the metric is different as compared to other
clients, the P routers, and the RRs, it may choose the 2nd-best path
as best, and this may lead to routing loops if it tries to send the
route to another router that has a different belief of what the best
path is. This case is much more likely if the RR and RR' are not
collocated with all of their clients and/or each other. I think that
this may also be the case when the tiebreaker is router-id if you're
not careful of the way that you address your route-reflectors and/or
are not doing next-hop self at the edges. Only in the case where the
2nd-best path is clearly worse to all members of the ASN (lower local
pref, longer AS-path, etc) are you assured of no possibility for two
routers each getting a different result when evaluating those two
different routes. I think that 4.2 covers some part of this case, in
the way that it documents its assumptions and what must be done to
enable deployment, especially the references to ignoring IGP metric,
but IMO it's not clear enough in the explanation why some of these
things must be done - the failure case isn't discussed.
4.1 Also, there's a definite scaling consideration on the RR
clients that isn't really discussed here - they are now going to be
storing some number of additional routes and paths that is linearly
related to the number of additional planes that are implemented.
The addition of more RR sessions that presumably carry a portion of
the full routing table now drives a non-trivial increase in memory
footprint and processing overhead (and potentially convergence time
for slower boxes). In the simplest case of 2 primary
route-reflectors (for diversity), and 1 2nd-best path RR, you've
added one session. If you want to carry a 3rd-best RR or have
redundant 2nd-best RRs, you've added 4 sessions. It's fair to say
that after a certain number of alternate paths, you start having
less routes because there are only so many alternative exits, but
otherwise there is a potentially large problem even if it's not
quite as bad as addpaths. I might recommend that you do some
analysis of the routing table to know where this threshold makes a
difference, based on how many alternate paths an average route
carries. In addition to being a scaling consideration, it also
helps to inform what value of N becomes diminishing returns because
most networks don't have that many backup paths. I envision this
being something like "80% of routes have 4 or less paths, so moving
beyond 4 planes may add overhead without much benefit..."
It is absolutely correct to say that more paths client carries the
more CPU cycles and memory will be used to process and store them.
However there is one observation to be made ... in 99% of cases I
have seen for distributing more then best path intra-domain the
sufficient number of paths per net on each client is 2.
WEG] the document should explicitly state this. That's exactly what I
was getting at when I mentioned analysis above. If nearly all
applications only need one alternate to bring the total paths to two,
and more would be diminishing returns, the document should recommend
this, and note that more are possible if the operator's situation
dictates by simply repeating the deployment more times. I will note
that this guidance as well as the note at the end of 4.2 that "The
additional planes of route reflectors do not need to be fully
redundant as the primary one does" contradicts your example because
it has both RR1' and RR2'.
IMHO cost of bringing additional paths for control plane is quite
well understood today. Moreover it is quite implementation dependent.
Some implementation may use X bytes per path while the other one Y
bytes to store the same path. I think some separate BGP scaling
document (even as BCP) may be equally useful for any technique to
advertise more then best path. I would prefer to keep this outside of
the solutions work on how to advertise and distribute those
additional paths.
WEG] I'm not looking for a level of detail that requires you to
discuss the number of bytes per path. Simply noting that scaling
issues exist and their general categories is enough. Make the logical
leap for your reader that implementing this solution brings with it
the scaling problems inherent with adding an additional route
reflector (and therefore its additional routes and paths).
It may be appropriate to add a separate scaling considerations
discussion to your deployment considerations (section 6) to
discuss some of the above.
I agree 100% .. but as stated above I do not find this specific to
diverse-path. It seems a general issue and I would highly encourage
someone to take a stub to document this in IETF/IDR/GROW or maybe at
Nanog community repository.
WEG] it may not be specific to diverse-path, but diverse-path is
specifically advocating doing something that would otherwise not be
done (adding additional RR<->client BGP peers w/full routes beyond
what is necessary for simple RR redundancy). Therefore I still think
that you need to discuss the specific scaling concerns that this
implementation needs to consider, even if it's at a relatively high
level and the document notes that these are not unique to this
implementation. I agree that a general scaling considerations
document may be appropriate, but since that does not exist and I
don't want this document to be blocked awaiting completion of such, a
brief discussion within this document would help a lot.
There may be additional operational considerations from the
perspective of route analysis - if you have either a homebuilt or
off the shelf set of software that does route analysis for the
purpose of event root-cause analysis, anomaly detection, capacity
planning/failure analysis, etc, it has to be aware of these
additional planes such that it returns the proper response when
evaluating the routing table to determine what the expected
behavior should be in the real network. This is especially
important when it uses the table to determine how traffic will
reroute during different failure scenarios. These tools may act
like a participant in the mesh rather than a client in order to get
a pure view of the table, and that may lead to undesired results if
the multiple planes aren't taken into account. There may also be
considerations for looking glass implementations and the actual
information that is visible on the RRs and RR clients as the result
of standard BGP show commands to aid in troubleshooting and
verification.
Very good point. Two comments on this ..
- As to the impact to the tools I am less worried as presence of
additional paths can be a fact today as already mentioned with full
mesh or as used by some operator's by playing with adjusting
different weight values of pair of RRs on a per net basis.
WEG] sure, but I don't think that it's valid to assume that all
analysis tools have taken this into account in their implementation,
so it's worth mentioning as an operational consideration. The comment
may be helpful to characterize the level of potential impact.
- The use of "planes" in the draft is more of a conceptual nature.
In practice all paths are still kept in the single table where normal
best path is calculated. That means that tools like looking glass
should not observe any changes nor impact.
WEG] a good clarification to add to the document.
This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or
subject to copyright belonging to Time Warner Cable. This E-mail is
intended solely for the use of the individual or entity to which it
is addressed. If you are not the intended recipient of this E-mail,
you are hereby notified that any dissemination, distribution,
copying, or action taken in relation to the contents of and
attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.