Hi Haibin,

On Sat, Jun 12, 2010 at 5:29 AM, Haibin Song <[email protected]> wrote:
> Hi Richard,
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Richard Alimi
>> Sent: Thursday, June 10, 2010 11:46 PM
>> To: Song Haibin
>> Cc: [email protected]
>> Subject: Re: [alto] ALTO with CPID extension
>>
>> Hi Haibin,
>>
>> On Fri, Jun 4, 2010 at 3:42 AM, Song Haibin
>> <[email protected]> wrote:
>> > Hi all,
>> >
>> > I would like to make a clarification on how CPID works and in which
>> > way it will benefit.
>> >
>> > First the ALTO server collects network topology and policy
>> information
>> > from network equipments. And then constructs a cost/priority matrix
>> > among the network aggregation points. And after a mathematical
>> > decomposition, the original cost/priority matrix turns into the
>> > multiply products of two matrixes. And each cost/priority
>> value in the
>> > original matrix is the multiply product of two vectors in
>> the latter
>> > two matrixes. And these vectors become the CPIDs for peers
>> under those
>> > aggregation points. There are two kinds of CPIDs, CPIDsource and
>> > CPIDdestination in the case the priority/cost from one area
>> to another
>> > area is not equal to the cost vice versa. A small map indicates the
>> > priority from the current ISP/AS to other ISP/AS (one to
>> many) will also be needed.
>>
>> Is the mechanism for interdomain specified in any more detail
>> somewhere?  In particular, how does an ALTO Client determine
>> that a peer's CPID is not "compatible" with its own (i.e., it
>> is not generated by the same ALTO Service Provider)?  What is
>> the structure of this map and why is it small?
>>
>
> There could be an identity for each ISP, I would assume this could be
> assigned by IANA. Clients can use this identity for the differentiation. The
> size of the map is linear to the number of ISPs. The CPID draft will be
> updated and you will see more details.

I'm not sure it is quite that simple. For example, there are many ISPs
that span large geographical areas (e.g., AT&T, Verizon, Comcast in
the U.S.) and they may have peering points between them in multiple
locations.  Then, the question "what is the cost from a customer in
AT&T to a customer in Comcast?" is *not* as simple as saying "What is
the cost from AT&T to Comcast?".  The real question is "what is the
cost from this location in the AT&T network to another location in the
Comcast network?".

This is one reason why we have both a Network Map (on the basis of IP
Prefixes) and a Cost Map in the ALTO Protocol today.

>
>> >
>> > Second, the CPID for each area and the the very small map could be
>> > delivered to the local DHCP server in that area, and the
>> end user will
>> > get their CPID through the DHCP server. The ALTO server
>> does not have
>> > to process any user requests. The end user does not need to
>> know more
>> > information than that got from the DHCP server.
>>
>> This is of course presuming that the ALTO Client can access
>> any DHCP options? Looking briefly on my linux machine, this
>> appears to require installing hooks (as root).  The situation
>> appears to be somewhat easier on Windows, which has a
>> DhcpRequestParams() function.
>>
>
> The client is required to support a new DHCP option if it wants to use this
> way to retrieve its CPID, this is bad, but the benefit is that it is not
> required to change anything to do ALTO server discovery.

That's fine, I'm simply pointing out one of the properties of the solution.

>
>> > Third, as each end user or resource provider all have their
>> own CPIDs.
>> > The end user will choose peers according to the source and
>> destination
>> > CPIDs and priority from its ISP/AS to others.
>>
>> It may also be good to enumerate assumptions that CPID has on
>> the Resource Consumer and Resource Providers so that one can
>> more explicitly judge in which deployment scenarios it might
>> apply. There are a couple that I can think of off the top of my head:
>> - Both Resource Consumer and Resource Provider must have ALTO Clients
>
> That's because we hope CPID could be the basic atrribute for each client,
> somewhat like IP address. The cost between two clients could be calculated
> by the "cost attribute" of these two clients.

That's fine, but note that it doesn't apply to particular deployment
scenarios such as mirror selection (mentioned in the charter).

>
>> - Both Resource Consumer and Resource Provider must have the
>> same ALTO Service Provider
>
> No. We also have interdomain traffic optimization. But if you want to
> totally cut off the "small map" I mentioned above, that would be possible if
> there is an third party to get all the cost among ISPs and turns them into a
> portion of each CPID vector.
>
>>(my initial interpretation is that
>> the "small map"
>> you mentioned above looks a lot like the Network Map and Cost
>> Map currently in the protocol -- is that right?)
>
> No Network Map. Somewhat similar to the Cost Map with only cost information
> from the current ISP to other ISPs.

See my comment above about the "small map" approach.

>> - The application protocol between the Resource Consumer and
>> Resource Provider can be changed
>>
> I think either way that you can let the resource consumer to get the CPID of
> the resource provider is okay.

In certain deployment scenarios, it also adds queries to an ALTO
Server (similar to queries to the existing Endpoint Cost Service).
You can probably argue that it still alleviates some load on the ALTO
Server since many P2P applications may have the luxury of augmenting
the protocol.

However, if the protocol cannot be changed, and a Resource Consumer
needs to query for the CPID of the Resource Provider, note that:
(1) There is still load on the ALTO Server roughly the same as would
be used by the Endpoint Cost Service.
(2) The benefits to P2P Privacy discussed in the current version of
the draft are gone (since you're sending IP addresses of peers to the
ALTO Server).

Again, I don't necessarily have a problem if a particular scheme
requires changing the application protocol.  I simply wanted to be
explicit about the deployment considerations.

>
>> > I would agree that the network map and cost/priority map could be
>> > re-calculated by efforts if an application has millions of
>> users and
>> > wants to collect this kind of information (CPID and IP addresses).

Just a side note: if you allow queries for a Resource Provider's CPID,
you no longer have the requirement that to learn a CPID there must be
an active client within that CPID.

>>
>> If users can recompute such information, that is fine.
>> However, I strongly object to making any claims as to any
>> increase in ISP privacy beyond the level currently offered by
>> the Network Map and Cost Map approach.  In addition, such
>> considerations should be stated clearly so that ISPs don't
>> fall into a false sense of privacy.
>>
>> > But ALTO server
>> > does not have to process user requests (offload to local
>> DHCP server,
>> > no burst load on the ALTO server, you even do not need ALTO server
>> > discovery in this case).
>>
>> Okay.
>>
>> > And the end user does not have to search the big map to
>> find all the
>> > PIDs for destination peers and then search the costs in the
>> cost map.
>>
>> As I mentioned before, I think the computational overhead for
>> a single network map lookup (i.e., a longest-prefix match)
>> and a single matrix lookup (indexing into a 2-dimensional
>> array) is extremely minor especially for an ALTO Client
>> deployed at an end-user's machine.
>
> CPID is mainly based on the calculation with the source CPID and destination
> CPID, not lookup.

I understand that.  I'm comparing to the Network Map / Cost Map
approach which requires a longest prefix match and indexing into the
row and column of a matrix.  For CPID, you need to index into the row
of a matrix twice and compute an a inner product.

My claim is that at the rate that a P2P client (presumably where this
would be deployed) is performing either of these operations, it isn't
going to matter.  I think it is useless to try and optimize at this
point to calculate a cost a few microseconds faster.

My only point for discussing the benchmarks below was to show that the
Network Map and Cost Map computations were indeed of that order.

>
>>
>> Our benchmarks for a single peer join and peer selection at a
>> tracker with 1,000,000 registered peers (which included a few
>> longest-prefix matches, 50 lookups into a peering weight
>> matrix the size of a cost map, as well as various other
>> bookkeeping and lookup operations) took under 40
>> microseconds.  The burden on an ALTO Client would presumably
>> be much less in the use case you mentioned above.
>>
>
> Okay. That is for a single selection.

Let me add a few more details of the methodology. The benchmark began
with the tracker empty. We then allowed 1,000,000 peers to join as
fast as they could by calling our addPeer() and selectPeers()
procedures in the code for each new peer.  We were simply benchmarking
the code and did not have any network messages in the experiment.

Note that addPeer() does the lookup into the Network Map for each ISP
(since this is a tracker) and we store the PID.  selectPeers() does
the indexing into the peering weight matrices.

> How many user requests per second are
> there in the tracker you mentioned above?

The effective load on the tracker at the end of the experiment was
25,000 peer joins per second (1 second / 40 microseconds).  I hesitate
to say "requests per second" since the benchmark did not include any
networking (but that isn't what is being debated here).

> What is the average number of
> peers in each swarm?

There are 1,000,000 in the swarm at the end.   The time measurement of
40 microseconds for a single peer is drawn from benchmark towards the
end of the experiment when when there are close to 1,000,000 already
in the swarm.

Again, a "peer join" in this benchmark included much more than a
single longest-prefix match and a single indexing into the matrix.  As
I recall, a non-negligible amount of time was actually drawn from
generating random numbers used in the peer selection (50 times in the
benchmark above).

If I get a chance in the next few days I'll be happy to setup an
experiment which only benchmarks a single longest-prefix match and
matrix lookup if you think it would be helpful to convince you that
looking up into a Network Map and Cost Map is actually a pretty cheap
operation.

Rich

>
> BR,
> Haibin
>
>> Thanks,
>> Rich
>>
>> >
>> >
>> >
>> > My two cents,
>> > Haibin
>> >
>> >
>> >
>> > _______________________________________________
>> > alto mailing list
>> > [email protected]
>> > https://www.ietf.org/mailman/listinfo/alto
>> >
>>
>
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Reply via email to