On Tue, Jan 23, 2024 at 10:12:33PM -0800, William Herrin wrote:
> Respectfully Chris, you are mistaken.
>
> https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2
>
> "a) Remove from consideration all routes that are not tied for having
> the smallest number of AS numbers present in their AS_PATH
> attributes."
>
> So literally, the first thing BGP does when picking the best next hop
> is to discard all but the routes with the shortest AS path.
Not true. Read the whole RFC--you've ommitted Sections 9.1 and 9.1.1, which
are very critical.
Discarding all but the routes with shortest AS path is _not_ literally the
first thing BGP does as you stated above.
The first thing BGP does is to calculate the degree of preference whenever BGP
receives a new route, withdrawn route or replacement route (See Section 9.1.1).
The determination of the degree of preference is considered to be a local
matter for each Autonomous System exercising route policy, typically expressed
using LOCAL_PREF, to execute upon the configured administrative policy to class
the incoming routes.
After completion of 9.1.1, section 9.1.2 and 9.1.2.2 which you cited begins
(Phase 2: Route Selection). Route selection under 9.1.2 is only invoked after
degree of preference is determined (called 'Phase 1' decision) as clearly
described in Section 9.1.
In fact, even in 9.1.2.2 that you cited above, it clearly states:
In its Adj-RIBs-In, a BGP speaker may have several routes to the same
destination that have the same degree of preference.
[ snip ]
The following tie-breaking procedure assumes that, for each candidate
route, all the BGP speakers within an autonomous system can ascertain
the cost of a path (interior distance) to the address depicted by the
NEXT_HOP attribute of the route, and follow the same route selection
algorithm.
The tie-breaking algorithm begins by considering all equally
preferable routes to the same destination, and then selects routes to
be removed from consideration. The algorithm terminates as soon as
only one route remains in consideration. The criteria MUST be
applied in the order specified.
[ snip ]
a) Remove from consideration all routes that are not tied for
having the smallest number of AS numbers present in their
AS_PATH attributes. Note that when counting this number, an
AS_SET counts as 1, no matter how many ASes are in the set.
So you see, the comparison of AS_PATH and therefore the route selection process
could only begin after routes are first resolved by their degree of preference,
often typically exercised by LOCAL_PREF across the AS (or other similar import,
such as Cisco's "weight" parameter which is applied before LOCAL_PREF locally
significant to the router itself where its been configured). The route
selection process, including the elimination of routes with inferior AS paths,
is a tie-breaker algorithm after degree of preference is first calculated,
which is what we've been trying to tell you. So no, AS_PATH comparison is not
literally the first thing BGP does.
You're ignoring Section 9.1.1 in its entirety, which chronologically begins
before Section 9.1.2.2 (the section you cited), which also clearly specifies
that route selection process described in it (including AS_PATH comparison) is
a tie-breaking procedure.
>
> It also says that BGP implementations are -allowed- to use other
> selection criteria.
Further followed by the following clause immediately afterwards:
"BGP implementations MAY use any algorithm that produces the __same results__
as those described here."
And restricted by the following clause in the preceding paragraph:
"The criteria MUST be applied in the order specified."
And clarified by Section 9.1:
"as long as the implementations support the described functionality and they
exhibit the same externally visible behavior."
> And there are many situations where doing so is
> well advised and improves the result. But AS path length is
> unambiguously the default, off which a user has to move it.
So, when a BGP implementation is written in a router software, how does the
manufacturer know whether your network is going to need to be applying lot of
degrees of preference, or none? The vendors have no idea, and RFC also
clarifies that degree of preference is a local policy matter. Therefore, the
default behavior is to assume a universally same LOCAL_PREF until a policy is
configured, which typically has been '100' across many vendor implementations.
In this instance, since all routes have the same degree of preference of 100,
Section 9.1.2.2 you cited then begins to tie-break the routes of same
preference, starting with the AS_PATH comparison, but it is absolutely by no
means, the first thing BGP does, at all. The first thing BGP does as clearly
specified in the RFC is to determine the degree of preference to meet local
routing policy.
The degree of preference differs greatly depending on what type of network you
run. If you're an edge consumer ASN (such as multi-homed stub enterprise
running BGP), without providing any downstream IP transit to other BGP
customers, and not peering with other networks (at an IX or otherwise), then
your network probably doesn't have a lot of need to apply administrative policy
to determine a degree of preference, and you can be happy fiddling with just
AS_PATH.
But if you're running a network which provides transit to other ASNs and
peering with other networks, then suddenly, applying administrative policy is
not only desirable, but operationally required. This isn't solely a
revenue/greed problem as some have cynically stated, but it's actually also a
critical service availiability and reliability issue, because not having degree
of preference pursuant to established routing policy in an IP network
completely eliminates the ability to implement a desired predictability in
traffic engineering to meet capacity planning objectives for network
interconnections.
Are there exceptions, pitfalls to this, where poorly designed or thought-out
networks suffer in certain routing situations? Absolutely. But that's the
Internet-- it's not perfect, but it works very well most of the time for most
situations.
Your desired 'policy-free, AS_PATH-only' world may solve your particular
complaint at hand, but it absolutely would break the rest of the Internet, with
no effective ways to implement routing policy for large-scale network
interconnections that make the Internet tick. BGP exists to provide anchors to
apply routing policy into the path selection process at scale. It is wrong to
assume that AS_PATH is the first thing and the only thing which matters in BGP,
through incorrect and out-of-context parsing of the RFC to fit your desired
narrative.
In operational realities, backed by the history and the RFCs themselves, the
single most important and influencial knob in BGP is actually arguablely the
LOCAL_PREF, more so than AS_PATH. Sadly, most people won't get to experience
this until they've run or dealt with operational realities of managing a large
IP network. The problem you're complaining about is an exception, primarily
caused by your poor selection of IP transit provider at the data center which
you're running AS11875, and you're demanding everyone else to take
responsibility for the purchasing decision you've made. There are some good
proposals, such as commonly accepted wide communities for commonly encountered
traffic-engineering scenarios to help improve upon this, and make BGP a better
experience for the end-user in situations like the one you're having, but we're
not quite there today, and it's understandably not going to be a quick process.
In the meantime, in the immediate short term, glad to hear that your route
pollution announcement solved the issue for you. In the medium-term, you
should get a new transit provider for AS11875 with better connectivity into
3356. Long-term, perhaps commonly accepted wide communities could become a
standard some day to improve knobs in situations like this.
James