Hi Martin and all, Thanks a lot for these wonderful comments! We are sorry for the late response but it took us a while to digest the comments and discuss internally. The responses will arrive in multiple emails, focusing on different aspects.
Before going to the technical and editorial updates, I would like to provide some context to make the document easier to follow (hopefully), because complexity is definitely not something we want in IETF. The ultimate goal is to enable more efficient delivery of ALTO information to clients. Besides generic mechanism (e.g., upgrading HTTP version), our design wants to leverage some characteristics of ALTO traffic, as summarized below (C1-C4): C1: ALTO resources are more likely to evolve incrementally (e.g., network map changes triggered by network maintenance, reconfiguration or failure). C2: Clients continuously fetch or monitor ALTO resources. C3: Some clients may make customized queries but many clients request the same ALTO resources C4: Clients are heterogeneous, i.e., they request ALTO information with different timing, frequency, transport objective, etc. Given the characteristics, the objectives/requirements of the new transport document are 1. to support incremental updates mixed with “snapshots” (based on C1 & C2), 2. to enable cacheable updates to leverage web caches for (potentially) faster and more efficient distribution (based on C3), 3. to enable customized update selection and scheduling for more flexible and efficient state synchronization (based on C3 & C4, as different clients may have different local state), 4. to allow more resource-efficient server implementations but with flexibility to enhance performance: no need to store multiple copies of the same data, no need to store the complete update history, but must not violate reliability, i.e., any client joining at any time with any local state should be able to compute the latest available state (based on C3 & C4, as well as server heterogeneity), 5. to reduce latency for applications with higher demand for reactivity, e.g., an early I-D by Tencent introduces the case of distributing real-time base station results through ALTO (based on C4), 6. and backward compatibility (not based on any particular characteristic). Make sense so far? Then let’s see why RFC 8895 does not work: 1. RFC 8895 already supports a mixed transport of incremental updates and snapshots. 2. In RFC 8895, however, the updates are “unnamed” so that they can not be shared across different clients using purely ALTO protocol & extensions. 3. In RFC 8895, updates are scheduled purely by the server and clients have no control flexibility. 4. In RFC 8895, a server may share the internal storage for queries to the same ALTO resource(s). However, to guarantee correctness, a server must maintain the state of each client, and also store the history from the version held by the oldest client. If a server is short on resources, it may need to disrupt the service for clients with older versions to reduce storage overhead. 5. RFC 8895 offers server push capability using server sent event over websocket. However, it is the source of why RFC 8895 fails to address the previous requirements. Then what about the new transport mechanism specified in this document? 1. This document introduces a graph structure to describe the evolving of an ALTO resource. One can think of the graph as a git tree of only one branch: each node is a version and each edge is a patch. Note that a snapshot is a patch to the initial (empty) state. 2. With each state versioned, the snapshots and incremental updates now have names so that they are now cacheable. Then ALTO servers and clients can gain from the common HTTP web cache infrastructure that is widely deployed in the Internet today. 3. The graph structure only gives the “metadata” describing what patches are available to transit from one version to another. Clients can then determine whether to synchronize, which state to synchronize to, what is the best way to synchronize, etc., based on its own local state and configurations. 4. As the scheduling of updates is now determined by clients, an ALTO server now has full control flexibility of the storage and the available updates, as long as it satisfies the condition that there exists at least one available path to the most recent version from the initial state (i.e., a new client). Note that existing clients can always fallback to a new client by discarding its own local state. 5. This is probably the only piece that we need to be HTTP-version-specific. The transport mechanism is mainly designed for scalability and flexibility, it introduces one round of HTTP “RTT” to fetch the “metadata” -- not a big problem for applications/networks that are less sensitive to changes but not good for applications demanding fast reactions to network changes. Pushing is preferred in the latter scenario. HTTP/1.x does not support native server push functionality. Thus, we use long polling: a client sends a request to the next update based on the naming convention; then once there is an update, the server can send the result. Not the best option as the client needs to make another request to receive the next update which may happen very soon after the first one, but it’s probably the best we can do for HTTP/1.x now AFAWK. For HTTP/2 and /3, we do expect to leverage server push and would like to hear opinions and have further discussions with HTTP experts like you. Note that this is not expected to be a mandatory functionality for any ALTO server/resource, as it requires keeping track of client states and may not scale well on the Internet. However, we do see use cases such as in the 5G edge network, where this functionality can be highly useful. Based on the design, the document then includes specifications for: - the ALTO service that provides the new functionality (TIPS service or simply TIPS), including the creation/deletion/... (Sections 3 & 4) - the TIPS view (as the root of the update information related to the request resource) (most of Section 5) - fetching updates with the update graph, including the URL pattern, data format, URL pattern for the updates, and reliability requirements/invariants (Sections 5.5 & 6) - receiving updates pushed by servers, including the URL pattern for receiver set and how to subscribe (Section 7) We hope that this email conveys the high-level ideas of the new transport mechanism and makes the document easier to follow. While we are having this conversation to reach consensus on the design, we the authors are doing the wordsmithing for the detailed comments and will follow up on the mailing list soon. Looking forward to your feedback! Best, Kai > -----Original Messages----- > From: "Martin Thomson via Datatracker" <[email protected]> > Sent Time: 2023-03-23 10:24:10 (Thursday) > To: [email protected] > Cc: [email protected], [email protected] > Subject: [alto] Httpdir early review of draft-ietf-alto-new-transport-07 > > Reviewer: Martin Thomson > Review result: Not Ready > > I'm going to level-set here from the outset. I have not given this document as > thorough a review as it might need to be sure, but only because I was unable to > understand it in the hour or so that I spent. That's clearly not enough time > for something this complex, so adjust the finding of "Not Ready" accordingly. > > # Introduction > > This document aims to describe an ALTO-specific means of providing clients with > updates to the transport and network information. It expands on previous > efforts that use SSE, which are generic, but maybe don't take advantage of the > newer HTTP server push feature. > > ## This is ALTO > > I like that this doesn't shy away from making its design very specific to the > application. Lots of people get grandiose ideas that their design is wonderful > and generally applicable and try to build something very generic, in the process > losing touch with both the needs of their application. They might convince > (maybe deluding) themselves that others will take on their wonderful ideas and > apply them to completely different applications. > > This document suffers from no such delusions, which is great because this work > is really very difficult to understand for an outsider. I was involved > peripherally with the initial ALTO HTTP design, so have some familiarity with > its goals and structure, but I found this document very difficult to process. > Maybe more time would help, but I really can't justify spending that time. > > So I want to focus on the really high-level stuff. > > ## High Level > > The first of my issues makes me wonder if this has been implemented at all. And > as I went through this, I found myself asking that same question again multiple > times. Has it? > > Finally, I feel obligated to point out that expending effort on HTTP server push > is perhaps unwise given its relative adoption and success. That is, it has not > really succeeded in browsers and is being removed. Much of the benefit that a > protocol like ALTO gains is from using common infrastructure, design paradigms, > tooling, and so forth. ALTO is a small user of HTTP and so benefits from the > millenia of effort put in to improve HTTP by larger users. Those benefits don't > extend to server push, so there is a real risk of this work becoming hard or > impossible to deploy. > > ## Caveat, Reprise > > To soften this a little, it is entirely possible that some of my criticism is > rooted in not understanding the details well enough. This is a tower built on > top of a tower build on top of a tower build on top of a protocol that I know > reasonably well, but it is a long way from where I stand to the top of that > topmost tower. And it is fair to say that a good review of this document (what > I an not claiming to provide) would demand that I gain familiarity with the > entire stack. However, I think that there are several aspects of this document > that could do with some dedicated editorial effort in order to improve this sort > of accessibility. I've highlighted a few, but I almost certainly missed others > because my focus was on HTTP usage primarily (for instance, I did not consider > whether the security considerations were reasonable or even approximately > comprehensive). > > > # Issues > > ## Server Push Usage > > Section 7.1 says "A client can add itself explicitly to the receiver set or add > itself to the receiver set when requesting the TIPS view." It describes two > methods for doing this, but neither indicates which request will remain open so > that the client can receive push promises. > > HTTP server push requires that the server send pushes alongside an outstanding > request, but aside from discussion of streams in Section 7.3.1, I can't work out > how the client would do that. Section 2.4 also fails to make this clear. > > Consequently, I cannot convince myself that the primary feature of this document > will work. > > ## Use of Undefined and Poorly Defined Terms > > I'm raising this to the level of a serious issue because this draft is made > extraordinarily difficult to understand as a result of this. Take Section 2.3, > which introduces some terms. That same section then includes completely > different terms in Figure 1; terms that turn out to be critical concepts. > > I'll also note, though you might treat this as a separate issue, that while the > use of template-like URLs as a convention is a powerful explanatory tool, the > draft doesn't make this clear enough. The use of i and j for instance, are > introduced in an example, which is easy to skip, only to find that the rest of > the document critically depends on understanding what those mean. > > > ## DELETE, but not > > Section 4.4 describes a use for HTTP's DELETE verb that is novel to say the > least. If the goal is to use DELETE to remove something and that something is a > client's membership in a group (a receiver set here), then you should provide > each client with a URL of their own to delete. Whether you provide a resource > for the collection (which might be useful for adding clients) or not is up to > you, but this approach is not consistent with how HTTP is expected to operate > and will result in surprises. > > Of course, the use of one request (here, the DELETE) to stop server push that > might be happening on another request, is also not how server push is expected > to work. > > > ## Connections and Clients > > I can't pin this one down, but there seems to be some sort of assumption that > there is a 1:1 correspondence between connections and clients. That is not how > HTTP works. In HTTP, every request stands on its own. Though there might be > linkages between requests, those linkages should not affect how HTTP itself > operates, including server push. (You might detect a common theme here.) > > As I noted, I'm not completely certain about raising this issue because of a > lack of clarity about how the protocol is supposed to operate. > > > ## Specification by Example > > I found that this document leaned a little too heavily on examples, to the point > that it sometimes does not concretely specify expected behaviour at all. The > content of examples were used to show the general shape of what is being > considered. As I noted before, examples can be a powerful explanatory tool, but > it means that the true interoperability requirements are not always directly > written. Implementers need to infer normative requirements in some cases. > > See the use of /<tips-view-uri>/... everywhere it appears, Section 2.1.1 > (schema, where the figures are critical to understanding; I also found Figure 2 > very hard to understand, so I had to ignore it, even if it still seems crucial), > Figure 4, the figures in Section 2.4, Section 8.3. > > Probably the biggest hole here is j+1 in examples. I couldn't find a statement > anywhere that says that increments need to strictly increment by 1 each time > (separately, why 101 and not just 1?). There seems to be an assumption about > that, but the directory resources all seem to indicate that the server is > responsible for numbering increments. Fixing that seems important. > > ## Complicated > > This design is very complex. Some of the details in the document probably do > not need to be specified in the level of detail provided. Section 5 for > instance describes resources that provide clients with data about other > resources, which isn't really consistent with HTTP principles, but they probably > aren't necessary either. > > As long as servers adhere to the invariants (S5.5), clients can ask for > incremental resources based on what they know and either get a snapshot or > increments based on what the server is willing to serve them and what they are > willing to process. The design here requires additional round trips to gather > information about the information the client really wants when it could probably > ask for a resource, use etags to indicate what it has, use Accept to indicate > what it can handle, and the server could then work out what best to serve up. > > A specific problem that this design creates is strong coupling. The client > needs to know about URI structure in order to use the information in the TIPS > view. > > Similar comments might apply to managing the set of clients that might want > server push (though again, see the first issue). > > # Nits > > These are just the ones that really jumped out. > > I suggest checking for typos, I saw many. > > Please use proper section references when linking. Links in the form `<xref> target="RFC1234" section="4.5"/>` will ensure that your HTML is properly > generated. > > Please submit a bug report for whatever is going on with the caption on Figure > 2. It looks like you have a cross reference in there that xml2rfc is mangling. > > "long pull" is not a thing (Section 2.1) > > "Connection: Closed" is not a thing (Section 4.2) > > Please use real lists (1) especially when the content of the item is long and > (2) because it makes (a) reading and (b) citing the document easier. > > I can't work out why "next-edge" (Section 5.2) is null when server push is > disabled. Why would a client need to know that only when push is enabled? If > push is enabled, won't the client be pushed the next increment? On the other > hand, if push is disabled, won't the client need to know what to request next? > > > ## Section 5.5 is Complicated > > Section 5.5 says: > > > Continuity: ns -> ne, anything in between ns and ne also exists (implies ni -> > ni + 1 patch exists), where ns is start-seq and ne is end-seq > > This section might be reduced to saying: > > > A server needs to ensure that any resource state that it makes available MUST > be reachable by clients, either directly via a snapshot (that is, relative to 0) > or indirectly by requesting an earlier snapshot and a contiguous set of > incremental updates. > > > > _______________________________________________ > alto mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/alto </xref></tips-view-uri></[email protected]> _______________________________________________ alto mailing list [email protected] https://www.ietf.org/mailman/listinfo/alto
