Lisa, Thanks for the very detailed review of this draft. More comments in-line.
On 10/17/06, Lisa Dusseault <[EMAIL PROTECTED]> wrote:
It would probably not be useful at this point for me to suggest a resolution to absolutely everything, especially if that involves specific wording. It's much more likely the WG/editors would choose different wording and organization anyway. But when I get back from vacation (1st day of November) and catch up on the mailing list traffic, I will see if there's any place where I can suggest wording to capture what I meant in a way that the WG can agree. I understand there's often a balance between leaving options open for different implementations and extensions, and closing options down so that specific behavior can be depended on, and sometimes there are ways you can have a little of both. High-level comments, summarizing comments - The mechanism for creating a media resource and a media link entry in response to a single POST conflicts with at least one statement elsewhere in the draft, and has no example. This is one of those cases where I personally had some assumptions (that not every media resource had its own media link entry if the media resource had been created manually)
There may be 'other' media resources, but if they don't have an associated Media Link Entry then they are not 'in' the collection.
that weren't ever quite cleared up by the spec. If the client CAN create a media resource without also creating a media link entry, that should be a separate example.
- Overall, the responsibility model needs to be slightly better defined. E.g. we know the server is responsible for choosing a URL for new entries; it's not clear who's responsible for cleaning up linked entries if a user ever needs to clean up historical entries. Atom sometimes seems to split the responsibility, and those are the most complicated cases. More examples below as it's probably more useful to discuss specifics.
That used to be in the spec: http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-06.html#entry-constraints It should go back in.
- To an outsider or newcomer -- including me even though I've been following discussions closely for a while -- there's a part of the Atom model that's subtle but important to understand. Consumers of Atom feeds are supposed to look at the regular feed document, whereas publishers of Atom feeds are supposed to look at other, different resources to see how to edit or create posts. Publishers effectively look at a different feed than users do, one with extra metadata (the rel="edit" links). It's a different model than that of WebDAV or IMAP, because rather than have the client specify which metadata it's interested in, the server offers two choices with different addresses. I believe it would be useful to cover that part of the model upfront in addition to the other useful stuff already there.
Agreed.
Creating resources Explicit result of POST, section 4. Are there zero, one or more resources created with a POST? There's a line at the top of section 4 which says that "POST is used to create a new, dynamically-named, resource". However, that implies ONE, whereas with media entries, a POST could create TWO resources. I believe a successful POST request as described here MUST either result in one or two resources, never zero, and never 3 or more (in the absence of extensions).
A POST can create any number of resources. In the case of an entry collection it will be at least one. In the case of a media entries it will be at least two. Many other resources could be created but this spec should only concern itself with the ones of interest for the operation of the protocol, otherwise the protocol isn't of much use. For example, if we say that POSTing an entry MUST only create one resource then how does the associated weblog HTML page get created?
What is the expected behavior of seeing a POST to an entry URL (rather than a collection URL)? I see that this is currently undefined;
Yes, it is undefined. So is the effect of sending it a PROPPATCH, COPY, LOCK, PATCH or MEGAFOO. Agreed, a note to the effect that anything not defined in the spec is, well, un-defined. I.e. we aren't holding any other methods on the resources 'in reserve'.
it may be worth stating that to warn clients. (I'm pretty indifferent on this one, as in this case I can't see any obvious harm in different server behaviors existing, if un-warned clients try it intentionally without knowing the results. The only possible harm is if clients got confused, did a POST to an entry URL when a collection URL was intended, and the server does a success response which creates new resources or modifies existing resources in a way the client did not expect. An error response would certainly be harmless for this undefined case but a success response could be real interesting.) Creating entries with multiple media resources It's never explained how a client would go about creating a feed entry with a number of media resources. I imagine that it could be iterative; a client could create any of the resources at any time, and at any time after creating the feed entry, use PUT to update the feed entry to link to new media resources. I assume -- though I didn't see it stated in the document -- that it's the client's responsibility in almost all cases to put links in the feed entry to point to the media resources, otherwise the media resources are unlinked (effectively hidden to readers).
Agreed, that's how it should work and the text and examples should be bolstered to make that clear.
The exception to this general process is if the client first uses POST to create both the media resource and the "Media Link Entry" in one go. In this case, can the "Media Link Entry" (MLE) be transformed to a regular "Member Entry Resource" (MER)? I thought it would be possible, but discussed just a bit with Tim today and he says no, so there you have two different readings of the spec.
I'll agree with Tim on that.
I guess a related question is what would happen if a client does a PUT of media content to an entry resource, or entry content to a media resource.
I think that falls in the "un-defined" territory.
It's not clear to me whether a linked media entry is always listed in the metadata or not. - When one or more "edit-media" link relations appear, who has been responsible for putting them there?
The server.
- When a media resource is deleted, who is responsible for removing the media resource link from the MLE?
The MLE itself should be removed, and this is done by the server.
- Section 4 says that the MLE contains the metadata for a Media Resource, but that seems to only assume a single Media Resource. In the case of multiple Media Resources which the user intends to link into a single post, it's unclear to me whether there's one MLE for every Media Resource, or one MER for all the Media Resources created, or some other situation. Again, in quick discussion with Tim, he says there is one Media Link Entry per Media Resource. I can see how that would work but that was not at all my understanding before the discussion!
Tim is right, it's a 1-1 relationship. Obviously more text and examples are needed.
This document would benefit greatly from further examples: 1. An example of creating a MLE and MR in a single POST; the request, response and the result (resource URLs) described. 2. An example of modifying a MER to contain a new image or other media resource link: the request(s), possibly the responses (if it's interesting, it may not be), and definitely the result. 3. An example of modifying a MER to change metadata (e.g. category or adding a new link relation element or both); possibly a failed request example would be even more interesting than a successful one.
+1
Can a client modify an entry to contain a link relation element in the following cases: - To result in an "edit" or "edit-media" link relation, where the resource represented does not meet the requirements in section 11.1 or 11.2?
No.
- To result in an "edit" link relation that actually points to a media resource, or a "edit-media" link relation that actually points to a MER? - To point to a resource on a different server entirely?
There is no reason to believe that any of these resource are on the same machine to begin with. I could POST to media to machine A and have the MLE could be created on machine B and the editable media resource itself created on machine C.
- To point to a valid media resource or MER that happen to be in a different collection than the one normally used for this feed? - Will some servers forbid adding a link relation element entirely? Is it important for the client to know that that will always be forbidden for that server -- can it detect the "always forbidden" case separately from the "this particular edit is forbidden" case? Which of these are errors, and if so how is the error handled? Which of these MUST the server allow and handle? I understand there may be some need for flexibility here. Perhaps it's just standardized error messages required here. For example, if there are some servers which allow a given link relation to point to another server, and some servers which do not allow, how would the servers which do not allow respond with a sufficiently specific error, so that the client can avoid trying the same thing again?
I think some text explaining up front that for all intents and purposes the server is in charge and could reject/modify entries as it sees fit is in order.
Multiple formats/langs for media resources Multiple formats are not sufficiently defined -- e.g. JPG and PNG versions of an image resource. Format negotiation is hard.
I will agree with that last statement and would like to punt on stating anything concrete about creating variants beyond saying it's "un-defined".
I found guidance for how to select among different "edit-media" link relations depending on format and language, but I found no guidance on how to create multiple versions. If there's no guidance to clients or servers how to do it (would the client create multiple resources in different formats? could the server automatically do it as variants? could the server automatically do it as multiple resources, and would all formats be therefore listed?), it's probably worth considering whether there's possible interoperability harm here. I can imagine clients creating alternate-format versions quite successfully because the operations would be explicit, but when I imagine how servers would go about it, I can easily see ways it could go wrong (e.g. creating new URLs for resources that are invisible to the media collection, having multiple URLs in locations where clients expect only one). I think there may be a very basic confusion here -- in my head or in the document or both -- about what the "edit-media" link relation does and is for. When I read the text it seems to offer the possibility for multiple formats for a single media resource, as suggested by the text: "If a client encounters multiple "edit-media" link relations in an entry then it SHOULD choose a link based on the client preferences for type and hreflang". However, when I try to think about how a client would create a post with a totally independent set of JPG images (e.g. one of the Eiffel Tower, one of the Louvre and one of the Arche de Triomphe), the "edit-media" link relation also seems to have relevance. Which is it or both? (and as always, who is responsible for filling it in or removing it when new media resources are created or destroyed?)
Media to MLE is a 1-1 relationship. Other variants, such as PNG, JPEG and GIF of an image might be created for an image at the whim of a server.
Thomas Broyer said in email July 24 that "Having the Content-Location value equal to the Location one tells the client that the response body is a representation of the newly created resource". This is a subtle reading of HTTP and, if it's true, I want to make sure that implementors understand this without having to read the mailing list. The spec reads " the response from the server SHOULD contain a Content-Location header that contains the same character-by-character value as the Location header." If the response from the server does not contain both headers identical, what should the client conclude? I think this is one of those SHOULD recommendations where the consequence of it not being a MUST need to be considered. Possibly the spec needs to say under what conditions the server would do otherwise; possibly the spec should say what the client knows, or does not know, or must do, if the server does otherwise.
Personally I'd rather drop all that verbage about Content-Location and just point to RFC 2616 and the definition of Location and Content-Location.
Deleting Resources In the case of an entry that points to multiple media resources, can the server delete all those media resources and their MLEs? (I think not). If a client issues a DELETE to a media resource, is its MLE deleted? (The spec covers the opposite case already when a client issues a DELETE to an MLE.)
This is covered by M<->MLE 1-1.
Can collections be DELETEd? It's fine for servers to allow or no, but if servers don't support, what error to use.
"un-defined"
Editing resources Overall, the process for editing a resource is not entirely clear. I find the description of creating a resource (POST), and what the server can accept, ignore or reject, more clear than the description of editing a resource (PUT) . For example, there's normative text in section 9.2.1 (an example) relative to creating resources and handling metadata, but that text isn't duplicated for editing resources or obviously apply to editing resources. Thus: - Can the client change the category? (probably yes; MUST the server allow?) - Can the client change the atom:id? (probably never) - Can the client change the "updated" value to be some time in the future? Some time long ago? Or are there only two non-error changes -- "now" or "the previous value"? MUST the server accept the value if it's the same as the previous value? Or can there be servers that always ignore "updated" values from clients? (and if so, is it important for the client to know that the server does this) - Can the client change the set of link relations? (probably yes; but does that include "edit" and "edit-media" link relations only or also first/previous/next/last link relations?) In general the possible edits need to be covered to consider whether the server MUST allow these kinds of edits, or MAY, and if refused, what error for what reason.
Agreed, like I said, this was in previous versions and should be updated and restored: http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-06.html#entry-constraints
I already threw in a request for an example of modifying a resource above, because one of the cases for editing a resource is to add a media resource to it. The spec says "The value of atom:updated is only changed when the change to a member resource is considered significant. " The use of passive voice obscures who does what here. When the client doesn't suggest a value for "atom:updated", does the server provide one, and if so, how does the server know what is "significant"? I thought it would always be the client suggesting values, but Tim says that the server controls atom:updated which could imply that the client doesn't even need to suggest values. See above about whether the server MUST accept certain values for "updated", or more likely, MUST NOT accept suggested values for "updated" when they're clearly wrong (e.g. this entry was last updated on October 16, 1906). Can a server ever ignore part of an edit and successfully process the rest?
Yes. Think of a client that throws in a slew of random link elements with relations my server implementation doesn't understand. Same with foreign markup. The server is in charge. Here is another example, a server could take in an entry with content that was HTML and clean it up and the next time the entry is accessed the content could be XHTML.
For example, the server receives a PUT request that tries to edit the text of a MER and includes a new category value, the server accepts the new text but silently ignores the category value. I suggest the answer would be MUST NOT silently ignore suggested changes, particularly since there's no way in a PUT response to say "here's what the server actually stored". It may be my opinion differs here from that of the WG. I find silently ignoring input to be scary.
The client can always do a follow-up GET. I believe it's unavoidable and trying to specify it would either become a rat-hole or would end up making the spec impossible to implement.
Synchronization I predict that some AtomPub authoring clients will attempt to synchronize: to maintain an offline copy of the feed including all its MERs and media resources, and to keep that offline copy up-to-date. Some will probably even allow offline authoring of new posts, and offer to synchronize when the client next goes online -- because of the possibility of multiple authors, this may mean at times that the client would download new entries created by other authors, upload new entries created offline, and reconcile its offline copy of feed documents. Because authoring clients will attempt to do this based on Last-Modified and ETag -- after all, the functionality is all there in some form or another -- the spec needs a little more clarity on how the client can rely on this working. Otherwise, some servers may omit features that these authoring clients require, or implement them oddly. While I would never suggest repeating all the requirements from other specs (in this case HTTP), there are cases where clarity and interoperability are greatly improved by at least referencing explicitly requirements from HTTP. It's also possible to add new requirements based on features in HTTP, that apply to Atom servers alone.
Agreed, a little verbage and a link to http://www.w3.org/1999/04/Editing/ would be good.
You can see that I lean more towards the "explain confusing things" side and "add more stringent requirements" than to the "it's already written elsewhere side" by peeking at section 8.2 of CalDAV <http://www.ietf.org/internet-drafts/draft-dusseault-caldav-15.txt>. A mostly-explanatory guidelines section helps clients quickly understand a clear path towards synchronization, and new requirements for supporting ETags make things easier for what is, after all, a more limited use case of HTTP (calendaring) than the general case. If HTTP synchronization in authoring cases were clearly defined and had not lead to years of arguments since the last HTTP update, I would probably feel differently about just silently relying on the mechanisms in HTTP. In any case, I have very specific brief suggestions to cover synchronization so that it's implemented more successfully than not.
Hey, we used to have that in there too: http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-04.html#collections_model_usage
- Consider adding a brief section offering clients non-normative guidelines on synchronization. It doesn't have to limit server behavior so much as point out with green and red lights where the fairway is (mixing transportation and golfing metaphors in my head) - Make a few requirements of servers to avoid some of those HTTP ambiguities. For example: "The ETag or Last-Modified values for a member resource MUST change when any metadata in the resource changes, as well as text/content, and this includes "next" and "last" link relation values. The ETag or Last-Modified values of a member resource MUST NOT change solely because an associated other resource (e.g. the media resource being an associated resource to the media link entry resource) changed. " More open questions that might be related to synch or might have relevance even for clients that don't do full synch: - What is the relationship, if any, between the "atom:updated" value and the HTTP "Last-Modified" value. Can the "atom:updated" value ever be later (greater) than the "Last-Modified" value? I believe it can be the same or earlier, but the spec doesn't disallow the broken case. - Is it clear whether the client MUST GET the entry after modifying it in order to have an accurate offline cache? (this was mentioned in a post by Broyer Jul 13, but not in the document). I believe this is made clear already for the cases of getting the feed and also for POST/create, but not for PUT/modify. - Am I correct that the general assumption is that id's are there to see what entries are new, and URLs are there to see where to get them? That may mean that URLs could change, for a given ID -- Perhaps a feature to change the slug name of a image after attaching it. Is that theoretically possible?
Yes.
There are also efficiency considerations. - The spec could require that servers MUST return either the ETag, or the Last-Modified value, in any successful POST or PUT response. I personally favour this so that clients can rely on it, though obviously other opinions are valid here. - I really liked the idea of putting ETag in the author's feed, as discussed on the list but not appearing in the document, again for efficiency. Certainly, the spec could ignore these considerations for now. However, I have noted that many client implementations choose a single way to implement their logic that can be relied upon if a reliable approach is available, rather than respond differently to different implementations. Thus, I predict that if some servers implement more-efficient synchronization and others don't, clients will behave as if they're always talking to the less-efficient servers. The more-efficient servers will find it difficult to achieve better scalability through synch efficiency improvements because clients have already implemented reliable but inefficient synchronization and don't have reason to add a second logic path for the more-efficient servers. Internationalization How are categories compared? Case-sensitive, insensitive, according to which language? Would the categories "donné" and "donne" map to the same category as "Donne" and "DONNE"? I believe it's currently up to the server, which means unpredictable behavior from the point of view of clients. See http://www.ietf.org/internet-drafts/draft-newman-i18n-comparator-14.txt, which has passed IESG Evaluation except for some IANA actions. This is a danger area for any draft going before the IESG which looks carefully at i18n these days. How do lang tags inside the document relate to Content-Language information in headers? Does the most granular override the other possible values? What about when client provides to server? Does the server ignore or handle? Other The requirements for the link relations "next", "previous", "first" and "last" aren't as rigorous as for "edit" and "edit-media" link relations. Also they're defined quite separately -- I kind of thought that all the link relation types could be usefully defined in one section but if the editors prefer a different organization that's fine. But; is it OK if the resource pointed to by one of these link relations is on another server, in another feed, is a different kind of resource than you might normally expect, etc... ? I think the normal cases for these link relations are well-understood but not necessarily what a client should do if it encounters abnormal cases. Discovering feed reading URL A very minor feature request for the introspection document: it SHOULD contain the public or published read-only feed URL of the blog (Tim suggests using link rel="alternate" type="application/atom+xml", although I'm not sure that makes it sufficiently clear what it's for). This so that my blog editing tool can show me not only all the entries and media resources (all discoverable from the introspection doc already) but also where the blog is published, so that I can copy that link to my friends when telling them about my blog. Extensions When the client puts extension elements in a MER, MUST the server store those unrecognized extension elements?
No.
I think the answer to this is actually that servers often do not and should not be required to do so. That makes it hard for clients to extend AtomPub's syntax in ways that other clients will understand but servers don't care about. Consider the consequences: when some enterprising client developer decides to do something cool and useful and encounters servers that don't store their metadata in the obvious place, the client developer is going to quickly work around that by storing in some unobvious place. For example in HTML comments in the atom entry content, or microformats, etc. Is that all cool? (Aside: an example of clients working around servers like this is that some WebDAV servers in the very early days didn't actually allow clients to PROPPATCH custom properties as the authors clearly intended. Some client wanted to put extra structured information on a resource when it was locked. Instead of putting it in properties, since that didn't work reliably, the client instead put it in the LOCK entry's "owner" element! Of course that didn't reliably interoperate either because some servers overwrote the "owner" element with authoritative information -- the lock's actual owner as known by the server. So the workaround solution was also harmful to interoperability, only it was discovered after the client had shipped.) Workspaces What are workspaces? I would like to see a definition. I believe I understand that basically, a workspace corresponds to a single published feed; that a workspace contains the collections with the content authored for that feed. I know the WG discussed this so maybe I can suggest wording at some point or simply register my vote for saying what it *is*.
I'll make you a deal, you define what a "web site" is and then I'll define a workspace :) I think this is murky sematic territory best handled by the W3C TAG.
Besides the definition, I also wonder about workspace titles. That seems redundant with the title of the entry collection and possibly also the title of the feed (inside the main feed document). Is there any understanding of some of these values being identical, or any understanding of what different purpose they serve if they're not identical? OPTIONS response HTTP is unclear about where PUT and POST show up in Allow headers. WebDAV ran into this as an interoperability problem -- some clients assumed that if they didn't see PUT in the Allow header for a collection, they couldn't write to that collection (the client might be checking for permissions or policy, having already established that the server was a WebDAV server but not certain if PUT would be allowed to this particular place). Some servers had PUT in the Allow header value for a collection, some servers didn't, based on the literal reading that you couldn't actually PUT straight to a collection URL. Clients had to end up with the OPTIONS Allow: header response being useless in this case. With somebody else's hindsight, Atom doesn't have to leave this ambiguous for the special kinds of resources it defines... Cookie support, sessions, authentication Is there an assumption that clients MUST support cookies? without such a requirement explicitly stated, some clients won't, for reasonable security concerns. Instead, is there an assumption that clients MUST repeat authentication headers with each request? Or will servers effectively end up constantly "reminding" clients (through 401 errors) to authenticate? This might seem obvious but it definitely differs from regular HTTP practice where clients authenticate once and then stop sending authentication information automatically and it just works because of cookies. Also we'd experienced this as an interoperability problem in WebDAV interoperability tests where some server implementors insisted that certain WebDAV clients were completely broken in not supporting cookies. Are there assumptions that sessions will be maintained through persistent connections? I believe there should be none. That is, if you're a client implementor thinking that the first request will contain authorization and subsequent requests on the same connection have no authorization, think again.
I've stated my piece on authentication and the IETF requirements. Just let me know the boiler plate that needs to be put in there and i'll do it. I have no more energy for the subject.
ANCHOR sections It's not clear to me that the RFC Editor will know what to do with all the [[anchor... ]] sections. Most difficult of all, "anchor37" says "incomplete section". For the rest, sometimes the RFC Editor may need to know what to replace with what on publication. I'm sure the doc editors know what they meant but I personally was left guessing.
Agreed, will clean up.
Lisa
Thanks again for the close reading. -joe -- Joe Gregorio http://bitworking.org