Hi Joel.
Thanks for the review comments. (And sorry for taking so long to respond!)
"Joel M. Halpern" <[email protected]> writes:
Major issues:
The use of the term "switch" seems confusing. I had first assumed
that it meant an ethernet switch (which might have abit of L3 smarts,
or might not. I was trying not to be picky.) But then, in section 6.3
it refers to "core switches ... are the data center gateways to external
networks" which means that those are routers.
The switch vs. router terminology is tricky.
6.3 says:
Core switches connect multiple aggregation switches and are the data
center gateway(s) to external networks or interconnect to different
sets of racks within one data center.
How about I change that to:
Core switches connect multiple aggregation switches and interface
with data center gateway(s) to external networks or interconnect to
different sets of racks within one data center.
I know that is just side stepping this a bit, but Section 6.4 has more
text about the L2/L3 boundaries in various deployments. This document
is walking a bit of tightrope by trying to be general and not too
specific. If we get too specific, folk start screaming "that's not the
way my data center looks".
Moderate Issue:
The document seems to be interestingly selective in what modern
technologies it chooses to mention. Mostly it seems to be describing
problems with data center networks using technology more than 5 years
old. Since that is the widely deployed practice, that is
defensible.
I think this has to do with how the WG was chartered.
But then the document chooses to mention new work such as OpenFlow,
without mentioning the work IEEE has done on broadcast ad multicast
containment for data centers. It seems to me that we need to be
consistent, either describing only the widely deployed technology, or
including a fair mention of already defined and productized solutions
that are not yet widely deployed.
I'd be fine with taking out the references to OpenFlow. I don't think
it adds much to the document.
On a related note, the document assumes that multicast NDs are
delivered to all nodes, while in practice I believe existing techniques
to filter such multicast messages closer to the source are widely
deployed. (Section 5.)
This paragraph has been signficantly revised. The current proposed
text is:
Broadly speaking, from the perspective of address resolution,
IPv6's Neighbor Discovery (ND) behaves much like ARP, with a
few notable differences. First, ARP uses broadcast, whereas ND
uses multicast. Specifically, when querying for a target IP
address, ND maps the target address into an IPv6 Solicited
Node multicast address. Using multicast rather than broadcast
has the benefit that the multicast frames do not necessarily
need to be sent to all parts of the network, i.e., only to
segments where listeners for the Solicited Node multicast
address reside. In the case where multicast frames are
delivered to all parts of the network, sending to a multicast
still has the advantage that most (if not all) nodes will
filter out the (unwanted) multicast query via filters
installed in the NIC rather than burdening host software with
the need to process such packets. Thus, whereas all nodes must
process every ARP query, ND queries are processed only by the
nodes to which they are intended. In cases where multicast
filtering can't effectively be implemented in the NIC (e.g.,
as on hypervisors supporting virtualization), filtering would
need to be done in software (e.g., in the hypervisor's
vSwitch).
Is that better?
Minor issues:
I presume that section 6.4.2 which describes needing to enable all
VLANs on all aggregation ports is a description of current practice,
since it is not a requirement of current technologies, either via VLAN
management or orchestration?
Yes.
Section 6.4.4 seems very odd. The title is "overlays". Are there
widely deployed overlays?
I keep hearing yes, but proprietary, so little can be said about them.
If so, it would be good to name the
technologies being referred to here. If this is intended to refer to
the overlay proposal in IETF and IEEE, I think that the characterization
is somewhat misleading, and probably is best simply removed.
Hmm, I didn't actually write this text. It originally came from
draft-karir-armd-datacenter-reference-arch, which was merged into the
problem statement document by the WG.
I agree this section is kind of fuzzy,
I'm on the fence about what to do. Are there other opinions?
Is the fifth paragraph of section 71. on ARP processing and
buffering in the absence of ARP cache entries accurate? I may well be
out of date, but it used to be the case that most routers dropped the
packets, and some would buffer 1 packet deep at most. This description
indicates a rather more elaborate behavior.
RFC 1122 says:
2.3.2.2 ARP Packet Queue
The link layer SHOULD save (rather than discard) at least
one (the latest) packet of each set of packets destined to
the same unresolved IP address, and transmit the saved
packet when the address has been resolved.
RFC 1812 says:
3.3.2 Address Resolution Protocol - ARP
Routers that implement ARP MUST be compliant and SHOULD be
unconditionally compliant with the requirements in [INTRO:2].
The link layer MUST NOT report a Destination Unreachable error to IP
solely because there is no ARP cache entry for a destination; it
SHOULD queue up to a small number of datagrams breifly while
performing the ARP request/reply sequence, and reply that the
destination is unreachable to one of the queued datagrams only when
this proves fruitless.
Given that this document says it is a general document about
scaling issues for data centers, I am surprised that the security
considerations section does not touch on the increased complexity of
segregating subscriber traffic (customer A can not talk to customer B)
when there are very large numbers of customers, and the itneraction of
this with L2 scope.
The ARMD WG struggled a bit about scope, and all it was chartered to
do was a problem statement related to address resolution.
Looking at the title of the document "Problem Statement for ARMD", I'd
argue that's not helpful for an RFC given that ARMD will close and
there is no followup WG planned. How about I change the title to
something like:
Address Resolution Problems in Large Data Center Networks
I don't want to add other issues like traffic segregation to the
document at this point. Amoung other things, the WG really doesn't
have the energy for this... The intro is pretty clear (IMO) about the
limited scope of the document.
Thomas