[alto] Fw: Re: [Dyncast] CAN BoF issues #1 #5 #6

[email protected] Wed, 18 May 2022 06:41:38 -0700

Hi ALTO WG,

There was a Computing-Aware Networking(CAN) BoF of RTG area in IETF 113, which 
is to steer the traffic among multiple edge sites considering both network and 
computing resource statues. The progress was also presented briefly in ALTO WG 
meeting.


In the BoF, some people cared about the relationship between CAN and ALTO. We 
collected this issue and got the response from the proponents, also would like 
to post the clarification to see if there are more comments from the WG. Thanks!

#5 What is the relation between CAN and ALTO? (issue #5)
 
ALTO architecture has a central ALTO server pulling network status periodically 
to help managing deployment of the application and computing resource. But it 
is difficult for ALTO server to promptly assist many ingress nodes in choosing 
the optimal path based on the dynamic traffic conditions and computing 
resources at multiple locations because: 1) single point of bottleneck for all 
ingress routers to query application status; 2) time taken for ingress routers 
to get the responses from the ALTO server upon flows arrival; 3) ALTO server 
may not know the instantaneous congestion status of the network, all link 
bandwidths, all information about the actual routing and whether the candidate 
endpoint itself is overloaded according to RFC7971
CAN is to identify various measurements for service instances including the 
hosting environment, get them normalized together with network metrics for 
ingress nodes to choose the service instances.  

Regards,
Peng


[email protected]
 
From: Linda Dunbar
Date: 2022-05-18 05:46
To: [email protected]; dyncast
CC: rtgwg
Subject: Re: [Dyncast] CAN BoF issues #1 #5 #6
Peng, 
 
The resolution for Issue 2 “Relation to ALTO” can add more on why ALTO “can’t 
really help to the service request”. How about the following? 
 
Relation to ALTO (issue #5)
 
ALTO architecture has a central ALTO server pulling network status periodically 
to help managing deployment of the application and computing resource. But it 
is difficult for ALTO server to promptly assist many ingress nodes in choosing 
the optimal path based on the dynamic traffic conditions and computing 
resources at multiple locations because: 1) single point of bottleneck for all 
ingress routers to query application status; 2) time taken for ingress routers 
to get the responses from the ALTO server upon flows arrival; 3) ALTO server 
may not know the instantaneous congestion status of the network, all link 
bandwidths, all information about the actual routing and whether the candidate 
endpoint itself is overloaded according to RFC7971
CAN is to identify various measurements for service instances including the 
hosting environment, get them normalized together with network metrics for 
ingress nodes to choose the service instances.  Almost like the reverse of the 
ALTO. 
 
My two cents, 
Linda 
 
From: Dyncast <[email protected]> On Behalf Of [email protected]
Sent: Monday, May 16, 2022 6:24 AM
To: dyncast <[email protected]>
Cc: rtgwg <[email protected]>
Subject: [Dyncast] CAN BoF issues #1 #5 #6
 
Dear All,
 
Based on the categories of the CAN BoF issues, here are the responses to the 
following issues #1 #5 #6, which clarifies the relationship to ITU-CNC, 
3GPP-UPF and ALTO. Any comments are welcome. 
 
We will post the responses to more issues involved in BoF for more comments 
(https://github.com/CAN-IETF/CAN-BoF-ietf113/issues).  You can also add your 
comments to any of them. Thanks!
 
1. What is ITU-CNC and the relationship with CAN #1
 
CNC focus on the vision, scenarios, requirements, architecture and network 
function enhancements for future mobile core network and the telecom fixed, 
mobile, satellite converged network, but not for internet or routing area. CAN 
Aims at computing and network resource optimization by steering traffic to 
appropriate computing resources considering not only routing metric but also 
computing resource metric and service affiliation.
 
2. Relation to ALTO #5
 
ALTO has the potential opportunity to help to the deployment of the application 
and computing resource but can't really help to the service request because the 
ALTO service may not know the instantaneous congestion status of the network, 
all link bandwidths, all information about the actual routing and whether the 
candidate endpoint itself is overloaded according to RFC7971. Moreover, Alto is 
an indirection-based method, contrasting with the on-path solution advocated by 
CAN. 
 
3. Relation to 3GPP UPF #6
 
The CAN dyncast work is to depend on the network device to steering traffic 
other than the UPF. Virtualized UPFs in 5G have a similar issue: multiple UPFs 
instances can serve a group of gNB nodes. Selecting the UPF instance not only 
needs UPF load condition but also need network conditions.
 
Regards,
Peng
 


[email protected]
 
From: Linda Dunbar
Date: 2022-05-11 06:11
To: [email protected]
Subject: [Dyncast] Categories of the CAN BoF issues
CAN BoF proponents:
 
Many thanks for creating the CAN BoF issues tracking  in the Github: 
https://github.com/CAN-IETF/CAN-BoF-ietf113/issues/created_by/CAN-IETF?page=1&q=is%3Aopen+is%3Aissue+author%3ACAN-IETF
 
I went through the issues captured in the Github and characterized them into 
groups. Some issues can be lumped together for the discussion. There are quite 
a few issues related to the requirements, which need to be clarified.
 
Best Regards, Linda
 
 
Issues associated with Applications vs. Underlay networks:
·         Consider not to load underlay network with application details. #35
·         We have multiple upper layer application. Do we have additional needs 
for routing(e.g. WG?) or we are using those applications and won't need such 
new WG? #30
·         It needs application information too, so it can't just make a 
decision at the network layer. #23
·         This is not striked as a routing problem; it's all service discovery 
that can be done in higher layers. #21
·         3GPP and URSP solve this based on UPF selection. It uses both 
endpoint + application. #20
·         One overlay plane per application. Resources/metric specific to the 
plane. #19
·         How does the application layer or the transport layer learn the 
network status to steering traffic? #16
 
Need more clear requirements for CAN (to be addressed by 
draft-liu-dyncast-ps-usecases):
·         Need to understand if three are requirement to avoid extra messages 
or 1ms of latency #36
·         Regarding the flow affinity, is it from network perspective or from 
application/computation perspective? #33
·         How to effectively compute paths? Shall we put CPUs into account? #32
·         What happens when the user moves? If so we also need to move 
application context. #25
·         It can only move the services around as fast as it can update the 
routing plane. which comes back to the point about service discovery (waiting 
for convergence/distribution as opposed to just updating the SD server) #24
·         Whether the interests of the organization deploying the application 
and the organization providing the network connectivity are aligned. Google 
doesn't worry about this because they are both. #17
o    The question is more what the scope and semantic of information is that 
will need to cross organizational boundaries. This needs further study, in 
particular when assuming stakeholder division between service and network 
provider.
·         It seems impossible to satisfy that requirement simultaneously with 
the latency requirement. #15
·         It wasn't clear that how hard of a requirement session persistence 
is. #13
o    A session usually creates ephemeral state. If execution changes from one 
(e.g., virtualized) service instance to another, state/context needs transfer 
to another. Such required transfer of state/context makes it desirable to have 
session persistence (or instance affinity) as the default, removing the need 
for explicit context transfer, while also supporting an explicit state/context 
transfer (e.g., when metrics change significantly).
·         Should it select UPF based on the application? Steering is done per 
user? or per application? #9
·         This seems to assume conventional non-distributed applications just 
running at the edge. what about modern frameworks like Sapphire? and Ray? #7
o    It would be good to understand the multi-site requirements of such 
framework, which I have understood to mainly run in single DCs.
·         Relation to 3GPP UPF #6
·         Relation to ALTO #5
·         Do the mobility issues and associated protocols are also in scope? 
There are scenarios where routing alone would not be sufficient. #4
·         What is the position in the edge location regarding to UPF? #3
·         Is there some sort of authorization model so that an edge can 
indicate whether or not it will provide compute services? #2
·         What is CNC and the relationship with CAN #1
 
Measurement of the Computing Resources (to be addressed by 
draft-du-computing-resource-representation):
·         It is hard to use existing work to measure the computation, but we 
can optimize the latency through the performance monitoring. We have 
performance/measurement matrix over there. #34
·         Clarifications on the computing resource, its requirements and 
characteristics would be helpful. #27
·         Each application may have a different definition of "resources" these 
then have to be boiled down into a single topology Network Aware Computing 
(NAC! :) does scale #14
·         Is computing resource measurable? #10
o    It is, and how to use the measurement would be solution related. See IFIP 
Networking 2022 paper on how to simply expose “computing capability” and 
achieve better steering with such simple measure.
·         Why compute resource is different with other resources? #8
·          
Load Balance based solutions:
·         The point is that we need a standardized LB protocol #18
·         The LB as part of the application itself is superior (part of the 
distributed application itself is to obtain and keep updating the "best" 
unicast location to use). #22
·         If there is anything missing from current lbs that would prevent 
their use as-is? other than there is for market reasons no interop standard 
between different lbs? #12
·         For the load balance, should it learn the network’s status? #11
·          
Dyncast based Solution issues:
·         For Dyncast, when the time is short, is it possible for the router to 
decide the routing? It is too fast. #31
·         Is dyncast proposed to encapsulate? #29
·         Will CAN dyncast impact each and every router? How to avoid loops? #28
·         What's the assumed scale of a D-router? 10 ^ 6 sessions? 100^ 8? 
What's the assumed update rate? !Gb? 1Tb? #26

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

[alto] Fw: Re: [Dyncast] CAN BoF issues #1 #5 #6

Reply via email to