Nice draft and efforts on an important problem that is naturally in the
scope of ALTO.
Here are some comments. Some points include some discussions with Xuan
Zhang from Yale tonight. Sorry the comments are a bit long.
First, some high level comments:
======================
- I feel that it can be helpful to have a section to enumerate some key
differences between a CDN setting and a (swarm) P2P setting. Such
differences can drive part of the requirements.
As an example difference, we have that P2P has smart, adaptive clients,
and thus can have lower requirements on fault tolerance and load
balancing. A peer can have a large number of neighbors, and will evolve
the topology and load balance among this large set. On the other hand,
in a CDN setting, the serving set to a client is much smaller, and the
clients are assumed to be dumb, non-adaptive.
- I feel that it can be helpful to define the problem settings first
before delving into technical branches (HTTP Redirect vs DNS redirect).
The basic problem setting is that we have a client Host that needs to
select among a set of CDN nodes {CDN_1, CDN_2, ..., CDN_K}. A
fundamental challenge, I feel, is that ALTO info can be partial and
"colored" (i.e., has a perspective). Thus, it can be helpful to discuss
according to the perspective settings, instead of more technical detail
settings (HTTP Redirect vs DNS redirect). It can be helpful to solve two
fundamental settings:
(S1) Host H, {CDN_i}, and the network connecting them belong to a
single entity. ALTO info is from this single entity. This is a
relatively easy, useful setting.
(S2) Host H belongs to ISP, and {CDN_i} belong to CSP. This setting
can be much more complex, because there are two perspectives:
C^ISP(H<->CDN_i) vs C^CSP(H <-> CDN_i). Or we could introduce a third
perspective, for example, C(H <-> CDN_i) coming from a third measurement
party.
- Since this is CDN, does it may make sense to demonstrate, as a use
case, how ALTO info may be integrated into the system of a major CDN
system? Let me try an Akmai-like DNS based system, according to my
understanding from their patent several years ago. Anyone who knows
better public info please correct/update me. Let's call this system A.
In system A, the first step is to map src Host H to a serving region
represented by a higher level DNS server. To achieve this mapping, we
may do the following info:
Step 1: Mapping from src address H to SPID, a src PID (e.g., partitioned
according to local DNS server)
This mapping can be provided by ALTO Map, from an ALTO Server.
Step 2: Look up SPID in a cost map
Region 1 Region 2 .......... Region K
SPID1
SPID2
...
After this look up, we identify the lowest cost/closest region
(represented by RID).
This mapping can be provided by ALTO Map, from an ALTO Server.
Step 3: For a distributed implementation, the system directs to the
corresponding DNS server for the identified region. This map can be the
format:
Region -> lower level DNS server.
This mapping may not be provided by ALTO.
Note that the three steps can be streamlined into single hash
implementation.
At a lower level DNS server:
Offline computation (maybe with some online triggered load balancing
update), using consistent hashing and bin packing, to compute the map:
Dest address (including the bucket/customer info as part of the
destination name) -> a short list of CDN servers.
This map may not come from ALTO. But after the selection, there can be a
fine-grained tuning according to source address (otherwise, why deep
deployment, instead of clustering/data centers). The fine-tuning can use
a local, fine-grained ALTO Map, but need to be careful to not break load
balancing.
Some detailed comments:
==================
- Section 4: it can be helpful to clarify the setting: the wording
implies a single ALTO Server, which I assume, by default, is giving the
perspective from the CDN network. Section 4.4 touches upon this issue.
Moving it a bit forward can be helpful.
- Section 4: " ... intercepts an HTTP GET request (1)": add a reference
to Figure 1.
- First para below Figure 1: I do not understand why you need to
disambiguate PIDs containing only hosts from PIDs containing CDN nodes.
It may be helpful to elaborate more.
- First/second para below Figure 1: Why do you have to enforce only
costs from host to CDN? As an example, Akamai streaming uses multiple
levels of CDN nodes (Entry points, reflectors, Edge Servers). Knowing
info between these inter-CDN nodes can be helpful when computing
redirection.
- Second para below Figure 1: How to determine the CDN PID from the
hostname (domain name) of a URL?
Is this sentence trying to address the issue: "Therefore the IP
addresses contained in the cost maps may need to be correlated to domain
names a priori."? But this is still not fully clear yet. From the big
picture, it seems that the process is: (1) map from URL to a list of IP
addresses, and (2) look up in the Map for direction.
- Second para below Figure 1: For the last sentence, it can be helpful
to make it clear that the selection algorithm can be quite flexible and
customizable. For example, a standard algorithm I cover in my class
(from Akamai patent application) is to use consistent hashing + bin packing.
- GAP-1: I have no problem adding PID attributes. But the motivation, in
the context of the document, is not fully clear, as it is not made
explicit later how it could be used (did I miss it? if so, it can be
helpful to add a forward reference)
- top of page 7: "a appropriate" -> "an appropriate"
- top of page 7: "The issue of default cost if one of importance." if -> is?
- I like it that the document presents two approaches in Sections 4.3
and 4.4 respectively. I feel that Section 4.3 is conceptually simpler.
For Section 4.4, then there is the issue of converting application info
(CDN node load) to ALTO info. You may be forced to fine-grained PIDs in
order to distinguish different CDN servers; or you can use some
averaging of load of servers at a given location and add to the ALTO
costs; but this can be less effective in achieve load balancing. Also
note that Section 4.4 will force the ALTO info to be application
dependent during conversion.
- First para after Figure 3: why the recommendation of partition?
- Second/third para of Section 5: mixed use of Proxy and DNS Proxy.
- Section 6 and others: it may not be necessary to be limited to
selection based on the cost of CDN outgoing traffic; in some settings,
the selection can be based on incoming cost, for example, for UGN.
- Figure 5: note that a general case can be more complex: at P2Pi
meeting, a major issue we were trying to address was that there can be
multiple ISPs in between from a subscriber to CDN. I hope that the
"flattening-of-the-Internet" makes this less a problem.
- GAP-6 and GAP-7: I am not sure there is a need for defining an
explicit Border Router Attribute.
Richard
On 6/4/2010 12:09 PM, Reinaldo Penno wrote:
We posted a new Internet Draft on ALTO and CDNs
http://www.ietf.org/id/draft-penno-alto-cdn-00.txt
Regards,
Reinaldo
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto