Re: /32s in DC and BGP

Eric Rosen Wed, 12 Feb 2014 08:44:23 -0800

Robert> Could those who claim that that sending /32 or /64 or /128 in BGP
Robert> mainly within contained DC zone environment will not scale be a bit
Robert> more precise and kindly indicate what the real problem is ?


I think the issue is really straightforward, and has nothing specifically to
do with BGP.  My interpretation of the draft is that it is trying to meet
the following requirements:

- Allow the number of VMs to increase without bound.

- Give each VM a host address that is location-independent (and hence not
  summarizable within routing).

- Provide optimal routing from any client (presumably anywhere on the
  Internet) to any VM.

I think it is difficult to meet these requirements jointly without running
into some problems of scale.

Perhaps I'm just misunderstanding the requirements that the draft is trying
to meet.  Your mention of "contained DC zone environment" certainly suggests
that I'm overlooking some context.  Similarly, your remark in another
thread:

Robert> Inter-DC or DC to user rather always depends on careful choice and
Robert> when possible aggregation at the gateway.

suggests that summarization at some level is still going to be important for
good scaling.  However, the draft does not talk about that at all, nor does
it have normative references to other drafts that set the context.

I suppose it's possible, as Pedro seems to suggest, that my thinking is
hopelessly trapped in the last century ;-) Perhaps new developments in
hardware and/or virtualization mean that address summarizability is no
longer an important factor in scaling the routing system.  If that's the
assumption, it would be good to state it.

Note that the draft itself calls attention to these scalability issues by
explicitly positioning itself as being more scalable than an L2 solution.
While L3 is certainly more scalable than L2, much of that increased
scalability comes from L3's ability to summarize addresses on a topological
basis.

The second scaling issue has to do with rate of change.  Pedro has presented
some facts indicating that the time to complete a VM move is quite long
compared with routing convergence time.  I don't question that.  However,
that by itself does not imply anything about the rate of VM movement.  It's
not clear to me whether the rate of VM movement is supposed to be able to
increase without bound, or whether realistically this rate is expected to
always remain small compared to what routing can handle.  If the latter,
that would certainly be worth mentioning.

The third scaling issue is specific to the "ARP snooping" scheme, and to the
way that enduser activity (the generation of ARP responses) can lead to the
auto-origination of BGP routes.  Perhaps MVPN has inured us to this sort of
phenomenon.  But RFC6514 does talk of rate limiting the generation of MVPN
BGP routes, while this draft does not mention rate limiting at all.  A
related area of concern is the feedback loop where ARP responses cause BGP
activity, and BGP activity causes ARP responses. (This feedback loop doesn't
exist if the PEs get their information from the orchestration system, rather
than from snooping ARPs.)

In a previous message you said:

Robert> Just presence of learned entry in the ARP table should not trigger
Robert> the host route auto-generation.

So, where are the detailed rules relating ARP snooping to host route
auto-generation?

I do like Xiaohu's suggestion to de-emphasize the ARP snooping procedure and
to better document its applicability restrictions.  

I didn't mean to start a food fight, or to start yet another religious
discussion about the use of BGP.  I just think that the draft makes claims
of scalability that depend upon certain assumptions, and the assumptions are
not made explicit.  It doesn't bother me if the assumptions are
controversial, but I would like to know what they are.

Re: /32s in DC and BGP

Reply via email to