I have a few questions and concerns about draft-xu-l3vpn-virtual-subnet-03. 

- Section 3.3:

      PE routers SHOULD be able to discover their local CE hosts and keep
      the list of these hosts up to date in a timely manner so as to ensure
      the availability and accuracy of the corresponding host routes
      originated from them.

   Surely this is a MUST.  I don't see how the scheme can work without a
   responsive and reliable discovery mechanism of some sort.

   Since the draft does not require any particular discovery scheme, perhaps
   it should at least characterize the set of acceptable schemes.

- Is a PE supposed to discover all the local hosts, and originate a host
  route into BGP for each one of them?  Or are host routes originated only
  for a subset of the local hosts?

  I don't see anything in the draft that says how to choose a subset.
  However, it seems like in the intended use case, the hosts are VMs, and
  the draft says that a data center can contain millions of VMs.  Is each PE
  going to originate host routes for millions of VMs?

  If so, I don't understand why the scheme is claimed to be scalable.  A
  solution that relies on millions of BGP-distributed host routes might be
  expected to exhibit some scaling problems having to do with
  routing/forwarding table size.  (Note that section 3.9 proposes to
  distribute host routes not only to other DCs, but to "cloud user sites",
  as well.)

  The draft talks about the increased path optimality that one gains from
  using host routes.  Well, everyone knows that you get more optimal routing
  with host routes, but the Internet doesn't run on host routes because of
  the scaling issues. 

- I wondered originally whether the intention is that host routes are
  distributed only in the exception cases, where a VM moves off its "native"
  subnet.  But the draft doesn't seem to say anything like that.  It seems
  rather to be eliminating the traditional notion of a localized subnet, and
  then discussing how to "fool" the hosts into thinking that the localized
  subnets still exist.  But this raises the question of whether the draft
  discusses everything that might possibly break.  For instance, will DHCP
  still work?

  Probably the answer is going to be "anything that doesn't work any more
  isn't needed in the DC environment".  Maybe the draft just needs to state
  its applicability restrictions more clearly.

- To provide good scaling, one needs to consider not only the number of VMs,
  but the rate of movement.  How many VMs per second move from one DC to
  another, how many VMs per second are created, how many destroyed?  These
  rates will have considerable impact on the control plane.  This issue
  isn't even mentioned in the draft.

- If a PE originates a host route, I don't see anything in the draft that
  will cause the host route to time out and be withdrawn if the host
  disappears.  (There is discussion of what to do if the host shows up
  somewhere else, but I didn't see any discussion of what to do if the host
  just disappears altogether.)  Surely a scheme based on host routes for
  movable hosts needs some sort of 'garbage collection'.

- The draft suggests that if a PE, say PE1, has originated a host route for
  host H, and then PE1 sees a host route for H from another PE, say PE2,
  that PE1 should try to figure out whether H is still local, and withdraw
  the route if it concludes that H is no longer local.

  I believe this presumes that all VRFs have unique RDs; that should be
  stated.  (Otherwise a route reflector might not forward all the routes.)

  Suppose PE1 sees a host route for H from PE2, but PE1 then concludes that
  H is still local.  Is the local route to be considered preferable?  Does
  it install the BGP route from PE2, but not issue the proxy ARP responses?
  The draft should state the procedures for this case.

  What if there is a local BGP route for PE2, (say, from a CE router), but
  the BGP decision process chooses the remote route?

- It seems to me that the scheme does not work at all if a single site is
  attached to two PEs, UNLESS those PEs negotiate some sort of
  primary/secondary relationship.

  The draft does mention this:
  
       "In the scenario where a given VPN site (i.e., a data
       center) is multi-homed to more than one PE router via an
       Ethernet switch or an Ethernet network, Virtual Router
       Redundancy Protocol (VRRP) [RFC5798] is usually enabled on
       these PE routers. In this case, only the PE router being
       elected as the VRRP Master is allowed to perform the
       ARP/ND proxy function."

  But I'm not sure what to make of the "usually".  The draft does not
  say that its applicability is restricted to the cases where either (a) a
  site attaches only to a single PE, or (b) the site attaches to two PEs
  that are running VRRP with each other.  So we need to examine what will
  happen if the site attaches to two PEs that are not running VRRP.

  Suppose Site-1 has Host H-1, and attaches to PE-11 and PE-12.  Site-2 has
  host H-2, and attaches to PE-2.  Suppose further that H-1 and H-2 have
  addresses "in the same subnet".  PE-2 discovers the presence of H-2, and
  so distributes a host route for it; PE-11 and PE-12 import this route.

  Now H-1 sends an ARP request for H-2.  PE-11 and PE-12 both generate a
  proxy response.  That by itself is probably enough to mess up the
  communication from Site-1 to H-2.  But PE-11 and PE-12 will see each
  other's proxy responses, and hence will both conclude that H-2 is local.
  So they will both generate host routes for H-2 and distribute them to the
  other PEs.  Now all the other PEs will think that H-2 is reachable via
  PE-11, PE-12, and PE-2.  This will certainly screw up any attempts to
  reach H-2 from other sites.

  I think that the draft either needs to state that it is not applicable
  when two PEs attach to a site (unless they use VRRP), or else some
  protocol for choosing the "master PE" at a site needs to be developed.

- I don't completely follow some of the procedures for inter-subnet routing.
  From section 3.1.2:

      "Assume host A sends an ARP request for its default gateway
      (i.e., 1.1.1.4) prior to communicating with a destination
      host outside of its subnet. Upon receiving this ARP
      request, PE-1 acting as an ARP proxy returns its own MAC
      address as a response.  Host A then sends a packet for Host
      B to PE-1. PE-1 tunnels such packet towards PE-2 according
      to the default route learnt from PE-2, which in turn
      forwards that packet to GW."

  It seems to me that PE-1 will forward the packet according to the routes
  in its VRF (i.e., PE-1 actually functions as the default gateway), and the
  packet may or may not actually go to PE-2 and then to GW.  If Host B is
  out on the Internet, and there are Internet gateways at several sites, the
  one that actually gets used will not necessarily be the one that Host A is
  configured to use.

  I'm not sure this is a problem; it could be considered to be a feature.
  But it is certainly something that the draft should discuss.

- If host discovery is going to be done by snooping ARP traffic, and if host
  discovery is going to cause BGP activity, then we have some scaling and
  security issues that need to be discussed.

  By generating a "bogus" ARP response for host H, one can force a PE to
  originate a host route, and this in turn will cause some amount of traffic
  to H to be delivered to the wrong site.  That is, the effect of a bogus
  ARP Response is not limited to a particular site.  This certainly needs to
  be mentioned in the Security Considerations section.

  Further, by generating an arbitrary number of bogus ARP responses, one can
  cause a PE to originate an arbitrary number of host routes, thus causing
  an excessive amount of BGP activity.  This is an attack vector which also
  needs to be discussed in the Security Considerations.

  So I don't think it's true that the draft introduces "no new security
  considerations".

- The section on multicast mentions tunnels, but I think an important issue
  in multicast is going to be how the PIM Designated Routers at a given site
  do the RPF determination, and this isn't even mentioned.

- What is "VPN Instance Space Scalability"?  (I don't know the term "VPN
  Instance Space".)  



Reply via email to