Re: AD Evaluation of draft-ietf-atompub-protocol-11

Joe Gregorio Thu, 07 Dec 2006 09:07:02 -0800


Lisa,
  Thanks for the very detailed review of this draft. More comments in-line.


On 10/17/06, Lisa Dusseault <[EMAIL PROTECTED]> wrote:



It would probably not be useful at this point for me to suggest a resolution
to absolutely everything, especially if that involves specific wording.
It's much more likely the WG/editors would choose different wording and
organization anyway.  But when I get back from vacation (1st day of
November) and catch up on the mailing list traffic, I will see if there's
any place where I can suggest wording to capture what I meant in a way that
the WG can agree.  I understand there's often a balance between leaving
options open for different implementations and extensions, and closing
options down so that specific behavior can be depended on, and sometimes
there are ways you can have a little of both.

High-level comments,  summarizing comments

 - The mechanism for creating a media resource and a media link entry in
response to a single POST conflicts with at least one statement elsewhere in
the draft, and has no example.  This is one of those cases where I
personally had some assumptions
(that not every media resource had its own
media link entry if the media resource had been created manually)


There may be 'other' media resources, but if they don't have an associated
Media Link Entry then they are not 'in' the collection.

that
weren't ever quite cleared up by the spec. If the client CAN create a media
resource without also creating a media link entry, that should be a separate
example.

 - Overall, the responsibility model needs to be slightly better defined.
E.g. we know the server is responsible for choosing a URL for new entries;
it's not clear who's responsible for cleaning up linked entries if a user
ever needs to clean up historical entries.  Atom sometimes seems to split
the responsibility, and those are the most complicated cases.  More examples
below as it's probably more useful to discuss specifics.



That used to be in the spec:

http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-06.html#entry-constraints

It should go back in.

 - To an outsider or newcomer -- including me even though I've been
following discussions closely for a while -- there's a part of the Atom
model that's subtle but important to understand.  Consumers of Atom feeds
are supposed to look at the regular feed document, whereas publishers of
Atom feeds are supposed to look at other, different resources to see how to
edit or create posts.  Publishers effectively look at a different feed than
users do, one with extra metadata (the rel="edit" links).  It's a different
model than that of WebDAV or IMAP, because rather than have the client
specify which metadata it's interested in, the server offers two choices
with different addresses.  I believe it would be useful to cover that part
of the model upfront in addition to the other useful stuff already there.


Agreed.


Creating resources

Explicit result of POST, section 4.

Are there zero, one or more resources created with a POST? There's a line at
the top of section 4 which says that "POST is used to create a new,
dynamically-named, resource".  However, that implies ONE, whereas with media
entries, a POST could create TWO resources.  I believe a successful POST
request as described here MUST either result in one or two resources, never
zero, and never 3 or more (in the absence of extensions).


A POST can create any number of resources. In the case of an
entry collection it will be at least one. In the case of a media
entries it will be at least two. Many other resources could be created
but this spec should only concern itself with the ones of interest
for the operation of the protocol, otherwise the protocol
isn't of much use. For example, if we say that POSTing an entry
MUST only create one resource then how does the associated weblog
HTML page get created?

What is the expected behavior of seeing a POST to an entry URL (rather than
a collection URL)?  I see that this is currently undefined;


Yes, it is undefined. So is the effect of sending it a PROPPATCH, COPY, LOCK,
PATCH or MEGAFOO.

Agreed, a note to the effect that anything not defined in the
spec is, well, un-defined. I.e. we aren't holding any other
methods on the resources 'in reserve'.

it may be worth
stating that to warn clients.  (I'm pretty indifferent on this one, as in
this case I can't see any obvious harm in different server behaviors
existing,  if un-warned clients try it intentionally without knowing the
results.  The only possible harm is if clients got confused, did a POST to
an entry URL when a collection URL was intended, and the server does a
success response which creates new resources or modifies existing resources
in a way the client did not expect.  An error response would certainly be
harmless for this undefined case but a success response could be real
interesting.)

Creating entries with multiple media resources

It's never explained how a client would go about creating a feed entry with
a number of media resources.  I imagine that it could be iterative; a client
could create any of the resources at any time, and at any time after
creating the feed entry, use PUT to update the feed entry to link to new
media resources.  I assume -- though I didn't see it stated in the document
-- that it's the client's responsibility in almost all cases to put links in
the feed entry to point to the media resources, otherwise the media
resources are unlinked (effectively hidden to readers).


Agreed, that's how it should work and the text and examples should
be bolstered to make that clear.


The exception to this general process is if the client first uses POST to
create both the media resource and the "Media Link Entry" in one go.  In
this case, can the "Media Link Entry" (MLE) be transformed to a regular
"Member Entry Resource" (MER)?  I thought it would be possible, but
discussed just a bit with Tim today and he says no, so there you have two
different readings of the spec.


I'll agree with Tim on that.

 I guess a related question is what would
happen if a client does a PUT of media content to an entry resource, or
entry content to a media resource.


I think that falls in the "un-defined" territory.

It's not clear to me whether a linked media entry is always listed in the
metadata or not.
  - When one or more "edit-media" link relations appear, who has been
responsible for putting them there?


The server.

 - When a media resource is deleted, who is responsible for removing the
media resource link from the MLE?


The MLE itself should be removed, and this is done by the server.

 - Section 4 says that the MLE contains the metadata for a Media Resource,
but that seems to only assume a single Media Resource.  In the case of
multiple Media Resources which the user intends to link into a single post,
it's unclear to me whether there's one MLE for every Media Resource, or one
MER for all the Media Resources created, or some other situation.  Again, in
quick discussion with Tim, he says there is one Media Link Entry per Media
Resource.  I can see how that would work but that was not at all my
understanding before the discussion!


Tim is right, it's a 1-1 relationship. Obviously more
text and examples are needed.

This document would benefit greatly from further examples:
 1.  An example of creating a MLE and MR in a single POST; the request,
response and the result (resource URLs) described.
 2.  An example of modifying a MER to contain a new image or other media
resource link: the request(s), possibly the responses (if it's interesting,
it may not be), and definitely the result.
 3.  An example of modifying a MER to change metadata (e.g. category or
adding a new link relation element or both); possibly a failed request
example would be even more interesting than a successful one.

+1

Can a client modify an entry to contain a link relation element in the
following cases:
 - To result in an "edit" or "edit-media" link relation, where the resource
represented does not meet the requirements in section 11.1 or 11.2?

No.

 - To result in an "edit" link relation that actually points to a media
resource, or a "edit-media" link relation that actually points to a MER?
 - To point to a resource on a different server entirely?


There is no reason to believe that any of these resource are on
the same machine to begin with. I could POST to media to machine A
and have the MLE could be created on machine B and the editable media
resource itself created on machine C.

 - To point to a valid media resource or MER that happen to be in a
different collection than the one normally used for this feed?
 - Will some servers forbid adding a link relation element entirely?  Is it
important for the client to know that that will always be forbidden for that
server -- can it detect the "always forbidden" case separately from the
"this particular edit is forbidden" case?
Which of these are errors, and if so how is the error handled?  Which of
these MUST the server allow and handle?  I understand there may be some need
for flexibility here.  Perhaps it's just standardized error messages
required here.  For example, if there are some servers which allow a given
link relation to point to another server, and some servers which do not
allow, how would the servers which do not allow respond with a sufficiently
specific error, so that the client can avoid trying the same thing again?


I think some text explaining up front that for all intents and purposes the
server is in charge and could reject/modify entries as it sees fit is in order.

Multiple formats/langs for media resources

Multiple formats are not sufficiently defined -- e.g. JPG and PNG versions
of an image resource.
Format negotiation is hard.


I will agree with that last statement and would like to punt on
stating anything concrete about creating variants beyond
saying it's "un-defined".

I found guidance for how
to select among different "edit-media" link relations depending on format
and language, but I found no guidance on how to create multiple versions.
If there's no guidance to clients or servers how to do it (would the client
create multiple resources in different formats? could the server
automatically do it as variants? could the server automatically do it as
multiple resources, and would all formats be therefore listed?), it's
probably worth considering whether there's possible interoperability harm
here.  I can imagine clients creating alternate-format versions quite
successfully because the operations would be explicit, but when I imagine
how servers would go about it, I can easily see ways it could go wrong (e.g.
creating new URLs for resources that are invisible to the media collection,
having multiple URLs in locations where clients expect only one).

I think there may be a very basic confusion here -- in my head or in the
document or both -- about what the "edit-media" link relation does and is
for.  When I read the text it seems to offer the possibility for multiple
formats for a single media resource, as suggested by the text: "If a client
encounters multiple "edit-media" link relations in an entry then it SHOULD
choose a link based on the client preferences for type and hreflang".
However, when I try to think about how a client would create a post with a
totally independent set of JPG images (e.g. one of the Eiffel Tower, one of
the Louvre and one of the Arche de Triomphe), the "edit-media" link relation
also seems to have relevance. Which is it or both? (and as always, who is
responsible for filling it in or removing it when new media resources are
created or destroyed?)


Media to MLE is a 1-1 relationship. Other variants, such as
PNG, JPEG and GIF of an image might be created for an image
at the whim of a server.

Thomas Broyer said in email July 24 that  "Having the Content-Location value
equal to the Location one tells the client that the response body is a
representation of the newly created resource".   This is a subtle reading of
HTTP and, if it's true,  I want to make sure that implementors understand
this without having to read the mailing list.  The spec reads " the response
from the server SHOULD contain a Content-Location header that contains the
same character-by-character value as the Location header."  If the response
from the server does not contain both headers identical, what should the
client conclude?  I think this is one of those SHOULD recommendations where
the consequence of it not being a MUST need to be considered.  Possibly the
spec needs to say under what conditions the server would do otherwise;
possibly the spec should say what the client knows, or does not know, or
must do, if the server does otherwise.


Personally I'd rather drop all that verbage about Content-Location and just
point to RFC 2616 and the definition of Location and Content-Location.

Deleting Resources

In the case of an entry that points to multiple media resources, can the
server delete all those media resources and their MLEs?  (I think not).  If
a client issues a DELETE to a media resource, is its MLE deleted? (The spec
covers the opposite case already when a client issues a DELETE to an MLE.)


This is covered by M<->MLE 1-1.

Can collections be DELETEd?  It's fine for servers to allow or no, but if
servers don't support, what error to use.


"un-defined"

Editing resources

Overall, the process for editing a resource is not entirely clear.  I find
the description of creating a resource (POST), and what the server can
accept, ignore or reject, more clear than the description of editing a
resource (PUT) .  For example, there's normative text in section 9.2.1 (an
example) relative to creating resources and handling metadata, but that text
isn't duplicated for editing resources or obviously apply to editing
resources. Thus:
 - Can the client change the category?  (probably yes; MUST the server
allow?)
 - Can the client change the atom:id?  (probably never)
 - Can the client change the "updated" value to be some time in the future?
Some time long ago?  Or are there only two non-error changes -- "now" or
"the previous value"?  MUST the server accept the value if it's the same as
the previous value?  Or can there be servers that always ignore "updated"
values from clients? (and if so, is it important for the client to know that
the server does this)
 - Can the client change the set of link relations? (probably yes; but does
that include "edit" and "edit-media" link relations only or also
first/previous/next/last link relations?)
In general the possible edits need to be covered to consider whether the
server MUST allow these kinds of edits, or MAY, and if refused, what error
for what reason.


Agreed, like I said, this was in previous versions and should be
updated and restored:

http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-06.html#entry-constraints


I already threw in a request for an example of modifying a resource above,
because one of the cases for editing a resource is to add a media resource
to it.

The spec says "The value of atom:updated is only changed when the change to
a member resource is considered significant. "  The use of passive voice
obscures who does what here.  When the client doesn't suggest a value for
"atom:updated", does the server provide one, and if so, how does the server
know what is "significant"?   I thought it would always be the client
suggesting values, but Tim says that the server controls atom:updated which
could imply that the client doesn't even need to suggest values.  See above
about whether the server MUST accept certain values for "updated", or more
likely, MUST NOT accept suggested values for "updated" when they're clearly
wrong (e.g. this entry was last updated on October 16, 1906).

Can a server ever ignore part of an edit and successfully process the rest?


Yes. Think of a client that throws in a slew of random link elements with
relations my server implementation doesn't understand. Same with foreign
markup. The server is in charge.

Here is another example, a server could take in an entry with content
that was HTML and clean it up and the next time the entry
is accessed the content could be XHTML.

For example, the server receives a PUT request that tries to edit the text
of a MER and includes a new category value, the server accepts the new text
but silently ignores the category value.  I suggest the answer would be MUST
NOT silently ignore suggested changes, particularly since there's no way in
a PUT response to say "here's what the server actually stored".   It may be
my opinion differs here from that of the WG.  I find silently ignoring input
to be scary.


The client can always do a follow-up GET.

I believe it's unavoidable and trying to specify it would either
become a rat-hole or would end up making the spec impossible
to implement.


Synchronization

I predict that some AtomPub authoring clients will attempt to synchronize:
to maintain an offline copy of the feed including all its MERs and media
resources, and to keep that offline copy up-to-date.  Some will probably
even allow offline authoring of new posts, and offer to synchronize when the
client next goes online -- because of the possibility of multiple authors,
this may mean at times that the client would download new entries created by
other authors, upload new entries created offline, and reconcile its offline
copy of feed documents.

Because authoring clients will attempt to do this based on Last-Modified and
ETag -- after all, the functionality is all there in some form or another --
the spec needs a little more clarity on how the client can rely on this
working.  Otherwise, some servers may omit features that these authoring
clients require, or implement them oddly.  While I would never suggest
repeating all the requirements from other specs (in this case HTTP), there
are cases where clarity and interoperability are greatly improved by at
least referencing explicitly requirements from HTTP.  It's also possible to
add new requirements based on features in HTTP, that apply to Atom servers
alone.


Agreed, a little verbage and a link to http://www.w3.org/1999/04/Editing/
would be good.


You can see that I lean more towards the "explain confusing things" side and
"add more stringent requirements" than to the "it's already written
elsewhere side" by peeking at section 8.2 of CalDAV
<http://www.ietf.org/internet-drafts/draft-dusseault-caldav-15.txt>.
 A mostly-explanatory guidelines section helps clients quickly understand a
clear path towards synchronization, and new requirements for supporting
ETags make things easier for what is, after all, a more limited use case of
HTTP (calendaring) than the general case.  If HTTP synchronization in
authoring cases were clearly defined and had not lead to years of arguments
since the last HTTP update, I would probably feel differently about just
silently relying on the mechanisms in HTTP.

In any case, I have very specific brief suggestions to cover synchronization
so that it's implemented more successfully than not.


Hey, we used to have that in there too:

http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-04.html#collections_model_usage

 - Consider adding a brief section offering clients non-normative guidelines
on synchronization.  It doesn't have to limit server behavior so much as
point out with green and red lights where the fairway is (mixing
transportation and golfing metaphors in my head)
 - Make a few requirements of servers to avoid some of those HTTP
ambiguities.  For example:
 "The ETag or Last-Modified values for a member resource MUST change when
any metadata in the resource changes, as well as text/content, and this
includes "next" and "last" link relation values.  The ETag or Last-Modified
values of a member resource MUST NOT change solely because an associated
other resource (e.g. the media resource being an associated resource to the
media link entry resource) changed.  "

More open questions that might be related to synch or might have relevance
even for clients that don't do full synch:
 - What is the relationship, if any, between the "atom:updated" value and
the HTTP "Last-Modified" value.  Can the "atom:updated" value ever be later
(greater) than the "Last-Modified" value?  I believe it can be the same or
earlier, but the spec doesn't disallow the broken case.
 - Is it clear whether the client MUST GET the entry after modifying it in
order to have an accurate offline cache?  (this was mentioned in a post by
Broyer Jul 13, but not in the document).  I believe this is made clear
already for the cases of getting the feed and also for POST/create, but not
for PUT/modify.
 - Am I correct that the general assumption is that id's are there to see
what entries are new, and URLs are there to see where to get them?  That may
mean that URLs could change, for a given ID -- Perhaps a feature to change
the slug name of a image after attaching it.  Is that theoretically
possible?


Yes.

There are also efficiency considerations.
 - The spec could require that servers MUST return either the ETag, or the
Last-Modified value, in any successful POST or PUT response.  I personally
favour this so that clients can rely on it, though obviously other opinions
are valid here.
 - I really liked the idea of putting ETag in the author's feed, as
discussed on the list but not appearing in the document, again for
efficiency.
Certainly, the spec could ignore these considerations for now.  However, I
have noted that many client implementations choose a single way to implement
their logic that can be relied upon if a reliable approach is available,
rather than respond differently to different implementations.  Thus, I
predict that if some servers implement more-efficient synchronization and
others don't, clients will behave as if they're always talking to the
less-efficient servers. The more-efficient servers will find it difficult to
achieve better scalability through synch efficiency improvements because
clients have already implemented reliable but inefficient synchronization
and don't have reason to add a second logic path for the more-efficient
servers.

Internationalization

How are categories compared? Case-sensitive, insensitive, according to which
language? Would the categories "donné" and "donne" map to the same category
as "Donne" and "DONNE"? I believe it's currently up to the server, which
means unpredictable behavior from the point of view of clients. See
http://www.ietf.org/internet-drafts/draft-newman-i18n-comparator-14.txt,
which has passed IESG Evaluation except for some IANA actions.   This is a
danger area for any draft going before the IESG which looks carefully at
i18n these days.

How do lang tags inside the document relate to Content-Language information
in headers? Does the most granular override the other possible values?  What
about when client provides to server? Does the server ignore or handle?

Other

The requirements for the link relations "next", "previous", "first" and
"last" aren't as rigorous as for "edit" and "edit-media" link relations.
Also they're defined quite separately -- I kind of thought that all the link
relation types could be usefully defined in one section but if the editors
prefer a different organization that's fine.  But; is it OK if the resource
pointed to by one of these link relations is on another server, in another
feed, is a different kind of resource than you might normally expect, etc...
?  I think the normal cases for these link relations are well-understood but
not necessarily what a client should do if it encounters abnormal cases.

Discovering feed reading URL

A very minor feature request for the introspection document: it SHOULD
contain the public or published read-only feed URL of the blog (Tim suggests
using link rel="alternate" type="application/atom+xml", although I'm not
sure that makes it sufficiently clear what it's for).   This so that my blog
editing tool can show me not only all the entries and media resources (all
discoverable from the introspection doc already) but also where the blog is
published, so that I can copy that link to my friends when telling them
about my blog.

Extensions

When the client puts extension elements in a MER, MUST the server store
those unrecognized extension elements?

No.

 I think the answer to this is
actually that servers often do not and should not be required to do so.
That makes it hard for clients to extend AtomPub's syntax in ways that other
clients will understand but servers don't care about.  Consider the
consequences: when some enterprising client developer decides to do
something cool and useful and encounters servers that don't store their
metadata in the obvious place, the client developer is going to quickly work
around that by storing in some unobvious place.  For example in HTML
comments in the atom entry content, or microformats, etc.  Is that all cool?


(Aside: an example of clients working around servers like this is that some
WebDAV servers in the very early days didn't actually allow clients to
PROPPATCH custom properties as the authors clearly intended.  Some client
wanted to put extra structured information on a resource when it was locked.
 Instead of putting it in properties, since that didn't work reliably, the
client instead put it in the LOCK entry's "owner" element!  Of course that
didn't reliably interoperate either because some servers overwrote the
"owner" element with authoritative information -- the lock's actual owner as
known by the server.  So the workaround solution was also harmful to
interoperability, only it was discovered after the client had shipped.)

Workspaces

What are workspaces?  I would like to see a definition.  I believe I
understand that basically, a workspace corresponds to a single published
feed; that a workspace contains the collections with the content authored
for that feed.  I know the WG discussed this so maybe I can suggest wording
at some point or simply register my vote for saying what it *is*.


I'll make you a deal, you define what a "web site" is and then
I'll define a workspace :) I think this is murky sematic territory best
handled by the W3C TAG.

Besides the definition, I also wonder about workspace titles.  That seems
redundant with the title of the entry collection and possibly also the title
of the feed (inside the main feed document).  Is there any understanding of
some of these values being identical, or any understanding of what different
purpose they serve if they're not identical?

OPTIONS response

HTTP is unclear about where PUT and POST show up in Allow headers.  WebDAV
ran into this as an interoperability problem -- some clients assumed that if
they didn't see PUT in the Allow header for a collection, they couldn't
write to that collection (the client might be checking for permissions or
policy, having already established that the server was a WebDAV server but
not certain if PUT would be allowed to this particular place).  Some servers
had PUT in the Allow header value for a collection, some servers didn't,
based on the literal reading that you couldn't actually PUT straight to a
collection URL.  Clients had to end up with the OPTIONS Allow: header
response being useless in this case.  With somebody else's hindsight, Atom
doesn't have to leave this ambiguous for the special kinds of resources it
defines...

Cookie support, sessions, authentication

Is there an assumption that clients MUST support cookies?  without such a
requirement explicitly stated, some clients won't, for reasonable security
concerns.  Instead, is there an assumption that clients MUST repeat
authentication headers with each request?  Or will servers effectively end
up constantly "reminding" clients (through 401 errors) to authenticate?
This might seem obvious but it definitely differs from regular HTTP practice
where clients authenticate once and then stop sending authentication
information automatically and it just works because of cookies.  Also we'd
experienced this as an interoperability problem in WebDAV interoperability
tests where some server implementors insisted that certain WebDAV clients
were completely broken in not supporting cookies.

Are there assumptions that sessions will be maintained through persistent
connections?  I believe there should be none.  That is, if you're a client
implementor thinking that the first request will contain authorization and
subsequent requests on the same connection have no authorization, think
again.


I've stated my piece on authentication and the IETF requirements.
Just let me know the boiler plate that needs to be put in there
and i'll do it. I have no more energy for the subject.

ANCHOR sections

It's not clear to me that the RFC Editor will know what to do with all the
[[anchor... ]] sections.  Most difficult of all, "anchor37" says "incomplete
section".  For the rest, sometimes the RFC Editor may need to know what to
replace with what on publication.  I'm sure the doc editors know what they
meant but I personally was left guessing.


Agreed, will clean up.


Lisa


Thanks again for the close reading.

  -joe

--
Joe Gregorio        http://bitworking.org

Re: AD Evaluation of draft-ietf-atompub-protocol-11

Reply via email to