Dear Time,

We have just submitted a revision of the Path Vector draft (-19). Below
are the links to the latest revision and the diffs. Please see inline
the pointers to the proposed changes to the comments, in brief:

1. More detailed examples are given in Sec 4.2:
   a. what is the role of ALTO server/client in the scenario;
   b. what an ANE represents and what information can be provided;
   c. how the ALTO client can use the information.
2. Clarification texts for "domain" in Sec 6.2.
3. Clarification texts is added in Sec 6.4.1 for the role of ALTO
   when the property "max-reservable-bandwidth" is provided.
4. Examples of the initial properties are explained in Sec 6.4.3.

Draft:
https://www.ietf.org/archive/id/draft-ietf-alto-path-vector-19.html

Diffs:
https://www.ietf.org/rfcdiff?url1=draft-ietf-alto-path-vector-17&url2=draft-ietf-alto-path-vector-19

With the revision, we hope the draft is now clearer and easier to follow.
Please feel free to let us know if there are further comments or suggestions.
Thanks!

Best,
Kai


> -----Original Messages-----
&gt; From: "Tim Chown" <[email protected]>
&gt; Sent Time: 2021-09-27 17:40:10 (Monday)
&gt; To: "[email protected]" <[email protected]>
&gt; Cc: "[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" 
<[email protected]>, "[email protected]" 
<[email protected]>
&gt; Subject: Re: Opsdir last call review of draft-ietf-alto-path-vector-17
&gt; 
&gt; Hi,
&gt; 
&gt; &gt; On 24 Sep 2021, at 07:54, [email protected] wrote:
&gt; &gt; 
&gt; &gt; Hi Tim,
&gt; &gt; 
&gt; &gt; Thanks for the review and suggestion. We agree that more concrete use 
cases and 
&gt; &gt; examples will be helpful and some parts of the document need to be 
better 
&gt; &gt; clarified. We will revise the document accordingly. Please see inline 
for detailed 
&gt; &gt; comments.
&gt; 
&gt; Inline with TC&gt; …
&gt; 
&gt; &gt; Best,
&gt; &gt; Kai
&gt; &gt; 
&gt; &gt; &gt; -----Original Messages-----
&gt; &gt; &gt; From: "Tim Chown via Datatracker" <[email protected]>
&gt; &gt; &gt; Sent Time: 2021-09-09 19:49:56 (Thursday)
&gt; &gt; &gt; To: [email protected]
&gt; &gt; &gt; Cc: [email protected], [email protected], 
[email protected]
&gt; &gt; &gt; Subject: Opsdir last call review of 
draft-ietf-alto-path-vector-17
&gt; &gt; &gt; 
&gt; &gt; &gt; Reviewer: Tim Chown
&gt; &gt; &gt; Review result: Not Ready
&gt; &gt; &gt; 
&gt; &gt; &gt; Hi,
&gt; &gt; &gt; 
&gt; &gt; &gt; I have reviewed this document (draft-ietf-opsec-v6-26) as part 
of the
&gt; &gt; &gt; Operational directorate's ongoing effort to review all IETF 
documents being
&gt; &gt; &gt; processed by the IESG.  These comments were written with the 
intent of
&gt; &gt; &gt; improving the operational aspects of the IETF drafts. Comments 
that are not
&gt; &gt; &gt; addressed in last call may be included in AD reviews during the 
IESG review. 
&gt; &gt; &gt; Document editors and WG chairs should treat these comments just 
like any other
&gt; &gt; &gt; last call comments.
&gt; &gt; &gt; 
&gt; &gt; &gt; This draft proposes an extension to the ALTO protocol to allow 
the definition
&gt; &gt; &gt; of Abstract Network Elements (ANEs) on a path between two 
endpoints that can be
&gt; &gt; &gt; considered when orchestrating connectivity between those 
endpoints, rather than
&gt; &gt; &gt; just computing based on the abstract cost of a path.  A Path 
Vector allows a
&gt; &gt; &gt; set of such ANEs to be defined for a path.
&gt; &gt; &gt; 
&gt; &gt; &gt; Caveat:
&gt; &gt; &gt; 
&gt; &gt; &gt; I am generally familiar with the work of the ALTO group.  My 
work at Jisc, a
&gt; &gt; &gt; national research and education network, includes assisting 
universities and
&gt; &gt; &gt; research organisations optimise large scale data transfers (up 
to petabytes of
&gt; &gt; &gt; data).
&gt; &gt; &gt; 
&gt; &gt; &gt; Overall:
&gt; &gt; &gt; 
&gt; &gt; &gt; I believe the document is generally well written, and the 
problem space it is
&gt; &gt; &gt; addressing is one for which there is value in defining a 
solution, but I feel
&gt; &gt; &gt; the document suffers from being too abstract and vague about 
what it is
&gt; &gt; &gt; defining, and its consideration of practical use cases could be 
improved.  Thus
&gt; &gt; &gt; I feel at this stage it is Not Ready for publication.
&gt; &gt; &gt; 
&gt; &gt; &gt; General comments:
&gt; &gt; &gt; 
&gt; &gt; &gt; The use cases defined are quite varied - large scale analytics, 
mobile and
&gt; &gt; &gt; CDNs.  SENSE and LHC are not specifically data analytics use 
cases in the usual
&gt; &gt; &gt; sense of the word, rather SENSE is a model for orchestrating 
network links (and
&gt; &gt; &gt; capacity) between sites, and the LHC provides large scale data 
sets for four
&gt; &gt; &gt; major experiments that are distributed and computed upon via the 
WLCG
&gt; &gt; &gt; (worldwide large hadron collider computing grid).
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; The document was first originated to support the data analytics use 
case, but
&gt; &gt; later was found to be useful in other scenarios. We will focus on the
&gt; &gt; analytics use case in the next revision.
&gt; 
&gt; TC&gt;  OK, that’s fine.  I know from speaking to people in groups such as 
at the GNA-G
&gt; Data Intensive Science WG that alto principles are of interest, but it 
would take some
&gt; significant effort to adopt them.   So perhaps there’s a future 
Informational document
&gt; To be written around that use case.
&gt; 

KAI: Indeed. Some early studies that investigate the direction of using ALTO to 
provide
resource discovery in data science networks (UNICORN and ReSA) are included in 
the references.
Another related work is G2 by Reservior Lab and we are working with Reservior 
Lab to integrate
ALTO in their framework. A talk on the integration will be given at IETF 112.

Regarding the use cases, we have included the following scenarios: 1) exposing 
network
bottlenecks with ALTO Path Vector and 2) exposing topology/resources of service 
edges. For
both scenarios, we draw images to show how ALTO is integrated and give examples 
of what
information can be provided.

&gt; &gt; &gt; 
&gt; &gt; &gt; For LHC, QoE is not so much about time to complete; the 
important point is not
&gt; &gt; &gt; to have data backlogging if performance drops.
&gt; &gt; &gt; 
&gt; &gt; &gt; For the WLCG, two networks have evolved over many years to carry 
the traffic
&gt; &gt; &gt; from the four main experiments; LHCOPN, the optical network, and 
LHCONE, the
&gt; &gt; &gt; overlay network, both of which are ‘manually’ configured, and 
with enough
&gt; &gt; &gt; capacity for the traffic thanks to regular network forward look 
exercises. 
&gt; &gt; &gt; While a little complex to administer, other emerging disciplines 
have expressed
&gt; &gt; &gt; interest in using LHCONE to move data, and some have established 
agreements
&gt; &gt; &gt; (e.g. SKA, I believe).  While a means to provision capacity on 
demand would be
&gt; &gt; &gt; attractive, the R&amp;E networks typically have capacity, 
LHCOPN/LHCONE carry the
&gt; &gt; &gt; LHC traffic, and bottlenecks are in the end sites (hence the 
evolution of the
&gt; &gt; &gt; Science DMZ principles).
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; Thanks very much for the clarification. Indeed we intermingled LHC 
with other
&gt; &gt; data analytics systems, which typically use the coflow abstraction 
[1] and
&gt; &gt; optimize for job completion time. We will clarify in the next 
revision that
&gt; &gt; different analytics systems have different QoE objectives and 
illustrate how
&gt; &gt; the path vector extension can support these use case respectively.
&gt; 
&gt; TC&gt; I think generally the LHCONE overlay is used more to support 
traffic engineering
&gt; (Aad to some extent trust) at site ingress/egress borders, e.g. to 
differentiate the science
&gt; traffic from the ‘day to day’ campus ‘business’ traffic.  This reflects 
the Science DMZ
&gt; principles later documented by ESnet.
&gt; 

KAI: We have separated the data analytics case to 1) the network is controlled 
by a single
network manager as in the geo-distributed data center case or an SDN network 
[NOVA], and 
2) the network consists of multiple networks [Unicorn/ReSA]. We also add [G2] 
as a reference
to demonstrate how the information can be used by the ALTO client.

&gt; &gt; &gt; 
&gt; &gt; &gt; Some specific examples of ANEs would be very helpful.  While the 
document does
&gt; &gt; &gt; contain examples, they are not grounded around a use case I can 
readily relate
&gt; &gt; &gt; to, such as the orchestration of a large data flow between two 
sites in
&gt; &gt; &gt; different R&amp;E networks.  Can the doc show some real examples?
&gt; &gt; &gt; 
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; That is a very good suggestion. We will add more examples in the next
&gt; &gt; revision to better motivate the use of ANE.
&gt; 
&gt; TC&gt; Great, thank you.
&gt; 

KAI: Please see Section 4.2 for the examples.

&gt; &gt; &gt; Section 3 talks of definitions of ANEs being “similar to” 
Network Elements in
&gt; &gt; &gt; RFC2216, but this is vague.  The topology in Figure 5 is quite 
simple, as an
&gt; &gt; &gt; example; something more realistic would be interesting.
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; We will add a more realistic example to motivate the definition of 
ANE and the
&gt; &gt; initial properties. As figure 5 is used to illustrate the examples of 
message
&gt; &gt; formats, we will move it to the example section.
&gt; 
&gt; TC&gt; that will also be very useful, thank you.

KAI: Please see Section 4.2 for the examples.

&gt; 
&gt; &gt; &gt; Ultimately, if ALTO
&gt; &gt; &gt; clients have the full network topology even then they may not 
know about the
&gt; &gt; &gt; routing that occurs by default, so implicitly there's an 
assumption of a
&gt; &gt; &gt; capability to steer traffic to meet a request. 
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; This is not entirely true. With path vector, the routing is already 
specified 
&gt; &gt; for a given source and destination pair. Thus, the client must not 
assume that
&gt; &gt; the ALTO server has the capability to modify the routing. In fact, 
for most 
&gt; &gt; cases, the network only exposes information about the path and does 
not provide
&gt; &gt; any control capability inside the network. For certain use cases the 
network may
&gt; &gt; provide  certain levels of control capability, for example, if a 
network allows
&gt; &gt; clients to reserve bandwidth for end-to-end communication, it may 
configure an 
&gt; &gt; ALTO server to provide the `max-reservable-bandwidth` property. Note 
this is not
&gt; &gt; an issue specific to the path vector document but to the ALTO 
framework: ALTO 
&gt; &gt; carries the information but how to use the information depends on a 
higher-layer
&gt; &gt; protocol. We will make this clear in the next revision.
&gt; 
&gt; TC&gt; That’s a useful clarification, again thanks.
&gt; 

KAI: Clarification texts are added in Section 6.4.1. We emphasize that ALTO is 
only
used for information exposure.

&gt; &gt; &gt; What is the “request” referred to in 5.1.2, for example?
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; The requests in 5.1.2 are referring to HTTP requests to ALTO 
services, mostly
&gt; &gt; requests to unified property services or requests to the same path 
vector resource.
&gt; 
&gt; TC&gt; OK.
&gt; 

KAI: We change "requests" to "requests to other ALTO resources" in Sec 5.1.2.

&gt; &gt; &gt; 
&gt; &gt; &gt; It seems that the document argues that ‘bottlenecks’ are 
typically capacity
&gt; &gt; &gt; based; do ANEs include specific links, rather than routers, 
firewalls, etc?   A
&gt; &gt; &gt; stateful firewall can be a significant bottleneck on throughput, 
for example.
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; ANE can include routers, firewalls and other middleboxes. However, an 
ALTO
&gt; &gt; server may not want and may not need to distinguish what the 
bottleneck really
&gt; &gt; is -- it is actually one reason why we use the term "abstract network
&gt; &gt; element". For example, the maximum throughput of a firewall can be 
considered
&gt; &gt; as the capacity of the ANE exposed to the ALTO clients. We will add 
the
&gt; &gt; firewall example to illustrate the use of ANE in the next revision.
&gt; 
&gt; TC&gt; I think the ‘problem’ is that by keeping the reference/naming 
“Abstract” it is
&gt; harder to ground the text in a real use case, so examples would help.

KAI: Examples of ANEs are both presented in Sec 4.2 (as part of specific use 
cases) and
in Sec 5.1 (as a standalone example).

&gt; 
&gt; In the Science DMZ case, campus firewalls (full stateful devices, with 
IDS) are often
&gt; a significant bottleneck (for example I saw a case recently where a 20G 
path only 
&gt; achieved 8G for a science flow due to the IDS, even with it configured not 
to scan
&gt; that traffic).
&gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; In 4.2.1 it talks of ALTO client identifying bottlenecks; a 
little more
&gt; &gt; &gt; discussion and examples of that would be useful, for practical 
use cases such
&gt; &gt; &gt; as an international R&amp;E data transfer.
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; We will add more discussions on identifying bottlenecks with path 
vector. Some
&gt; &gt; pointers are attached below.
&gt; 
&gt; TC&gt; OK.

KAI: We add the pointers in Sec 4.2.

&gt; 
&gt; &gt; &gt; The discussion on p.9 about multiple flows is a little odd; in 
practice in R&amp;E
&gt; &gt; &gt; networks large transfers use tools like GridFTP which uses 
multiple parallel
&gt; &gt; &gt; TCP flows, such that loss on individual flows does not severely 
impact
&gt; &gt; &gt; throughput.  Of course, BBR also reduces this concern.
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; For GridFTP and BBR, the multiple flows are established between the 
same
&gt; &gt; source and destination but the example contains two "flows" of two 
source and
&gt; &gt; destination pairs. The "multiple flows" in the example, however, 
represent
&gt; &gt; data transfers between different source and destination pairs but of 
the same
&gt; &gt; task (as in the coflow setting [1]).
&gt; &gt; 
&gt; &gt; Handling multiple flows between the same source and destination pair 
is
&gt; &gt; certainly an important use case. However, it cannot be solved 
completely by
&gt; &gt; the path vector draft alone. There is an individual draft called 
"flow cost
&gt; &gt; service" [2] which can potentially providing information for this use 
case,
&gt; &gt; together with the path vector extension.
&gt; 
&gt; TC&gt; OK, thanks.  In the LHC type of use case there are often flows 
between for
&gt; example worker CPU nodes and remote data transfer nodes, so your example
&gt; would fit that.  But sometimes there are flows between logical DTNs at 
each site.
&gt; 

&gt; &gt; &gt; 
&gt; &gt; &gt; Is the use of ALTO designed for single domain, or can it span 
multiple domains?
&gt; &gt; &gt;  It seems the latter, given the definition of ANE domains, but 
for the latter
&gt; &gt; &gt; there is no specific model for the common definition of ANEs.
&gt; &gt; &gt; 
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; The extension specified in this document is designed for a single 
administrative
&gt; &gt; domain. The term "ANE domain" might be misinterpreted: the domain 
here does not
&gt; &gt; refer to a network domain. Rather, it is inherited from the "entity 
domain" 
&gt; &gt; defined in Sec 3.2 in I-D.ietf-alto-unified-prop-new document [3], 
which is used more
&gt; &gt; in the mathematical sense of "domain": the set of valid objects of a 
specific type. 
&gt; &gt; In the unified property extension, an entity domain is defined by a 
specific ALTO
&gt; &gt; resource (called defining information resource).
&gt; 
&gt; TC&gt; OK, so that would be something that would be very useful to 
clarify, and probably mention
&gt; early in the document.
&gt; 

KAI: Clarification texts are added in Sec 6.2.

&gt; &gt; &gt; Given the definition of ANEs and PVs, how is traffic then 
orchestrated or
&gt; &gt; &gt; optimised?  Some pointers here would be useful.  SENSE may be 
one example. 
&gt; &gt; &gt; &gt;From my own discussion with people involved with SENSE (and 
AutoGOLE which uses
&gt; &gt; &gt; it) there is as yet no use of ALTO (rather SENSE uses its own 
methods to
&gt; &gt; &gt; orchestrate based on intent-based descriptors), but it is 
something that may be
&gt; &gt; &gt; considered in the future.
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; There are different ways to realize the traffic reservation: MPLS 
tunnels,
&gt; &gt; OpenFlow rules, or end-based traffic control (e.g., Linux tc 
command). For
&gt; &gt; specific orchestration mechanisms, please see below ([4]-[6]) for 
some pointers. We
&gt; &gt; will add these pointers to the use cases section.
&gt; 
&gt; TC&gt; Thanks.
&gt; &gt; 

KAI: We have added the pointers in Sec 4.2 and in Sec 6.4.1.

&gt; &gt; &gt; 
&gt; &gt; &gt; What of non-ALTO traffic on the same links; is the approach to 
reserve x%
&gt; &gt; &gt; capacity of a link for ALTO orchestrated traffic (the SENSE 
approach, I
&gt; &gt; &gt; believe)?
&gt; &gt; 
&gt; &gt; KAI:
&gt; &gt; ALTO is mainly used to expose the capacity information to the client 
and how the 
&gt; &gt; resource reservation is actually achieved is not in the scope of the 
document.
&gt; 
&gt; TC&gt; OK, so again clarifying that is useful (to someone like me not 
following the
&gt; work in great detail).
&gt; 

KAI: Clarification texts are added in Sec 4.2.1 and in Sec 6.4.1.

&gt; Overall it’s a good draft, but I think the above extra examples and 
clarifications would
&gt; be very welcome.
&gt; 
&gt; Best wishes,
&gt; Tim
&gt; 
&gt; &gt; 
&gt; &gt; &gt; 
&gt; &gt; &gt; Tim
&gt; &gt; &gt; 
&gt; &gt; 
&gt; &gt; 
&gt; &gt; [1] Chowdhury, M. and Stoica, I. 2012. Coflow: A Networking 
Abstraction for Cluster
&gt; &gt; Applications. Proceedings of the 11th ACM Workshop on Hot Topics in 
Networks
&gt; &gt; (New York, NY, USA, 2012), 31–36.
&gt; &gt; 
&gt; &gt; [2] https://tools.ietf.org/search/draft-gao-alto-fcs-06
&gt; &gt; 
&gt; &gt; [3] 
https://datatracker.ietf.org/doc/html/draft-ietf-alto-unified-props-new-18
&gt; &gt; 
&gt; &gt; [4] Viswanathan, R., Ananthanarayanan, G. and Akella, A. 2016. 
CLARINET:
&gt; &gt; WAN-Aware Optimization for Analytics Queries. 12th USENIX Symposium 
on Operating
&gt; &gt; Systems Design and Implementation (OSDI 16) (Savannah, GA, 2016), 
435–450.
&gt; &gt; 
&gt; &gt; [5] Xiang, Q., Chen, S., Gao, K., Newman, H., Taylor, I., Zhang, J. 
and Yang,
&gt; &gt; Y.R. 2017. Unicorn: Unified resource orchestration for multi-domain,
&gt; &gt; geo-distributed data analytics. 2017 IEEE SmartWorld, Ubiquitous 
Intelligence
&gt; &gt; Computing, Advanced Trusted Computed, Scalable Computing 
Communications, Cloud
&gt; &gt; Big Data Computing, Internet of People and Smart City Innovation
&gt; &gt; (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (Aug. 2017), 1–6.
&gt; &gt; 
&gt; &gt; [6] Xiang, Q., Zhang, J.J., Wang, X.T., Liu, Y.J., Guok, C., Le, F., 
MacAuley, J.,
&gt; &gt; Newman, H. and Yang, Y.R. 2018. Fine-grained, Multi-domain Network 
Resource
&gt; &gt; Abstraction As a Fundamental Primitive to Enable High-performance, 
Collaborative
&gt; &gt; Data Sciences. Proceedings of the International Conference for High 
Performance
&gt; &gt; Computing, Networking, Storage, and Analysis (Piscataway, NJ, USA, 
2018),
&gt; &gt; 5:1-5:13.
&gt; &gt; </[email protected]>
&gt; 
</[email protected]></[email protected]></[email protected]></[email protected]></[email protected]></[email protected]>
_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Reply via email to