Autodiscovery Draft Issues

Lachlan Hunt Tue, 28 Nov 2006 21:53:40 -0800


Hi,
  This feedback is related to the autodiscovery draft.

Before reading on, I suggest anyone writing a specification of any kindactually learn a little about how to write good conformance criteria.


http://ln.hixie.ch/?start=1140242962&count=1

I do not believe it is at all useful for this spec to continue as eithernormative or informational. If it were to be published asinformational, who would it's target audience be? What benefit would itprovide to anyone? What purpose would it serve?


James M Snell wrote:

To document best practice as it relates specifically to syndication
feeds.

It's not entirely clear what that actually means. How would it be anydifferent from, or more useful than, existing documentation on thesubject that has been around for the past 3 or 4 years.

What we do need is a normative specification that clearly defines bothdocument and user agent conformance requirements, and that really has tobe in a normative specification. The only issue that then remains iswhere this should take place and, for reasons documented later in thise-mail, I strongly believe that HTML5 is the correct place for this tobe defined.

For example, HTML5 says nothing about whether the relative orderof feed autodiscovery links within a document is significant. The Atomautodiscovery draft, however, defines that the order is significant.

That can be considered a limitation of the HTML5 spec which can beaddressed there. In fact, at the time of writing this, you've alreadyraised the issue on the WHATWG list and it looks like its been resolved.

Note: The rest of this feedback is written as though this spec werestill going to be published as a normative RFC, despite the suggestionthat it be published as an informational item only or not at all.That's because I had most of it written before that suggestion and it'suseful feedback anyway.

Feed autodiscovery should ideally be defined independent of thesyndication feed format. It is illogical to have a separateautodiscovery spec for Atom [1] and RSS [2]. As far as autodiscovery isconcerned, the only difference between these and any other format is theMIME type. But, if this spec is to continue, it should at least berenamed to "Syndication Feed Autodiscovery" or similar.



*Introduction*

The introduction should discuss the use of Atom, RSS and RDF SiteSummary because they're all widely used and are relevant to anyoneimplementing autodiscovery. I suggest it also talk about the genericconcept of what a syndication feed is (independent of the syntax) andonly refer to Atom, RSS and RDF as examples.



*Notational Conventions*

This section should be titled "Conformance Requirements". It shouldmake a clear distinction between user agent conformance and documentconformance, and clearly explain the requirements for each.

If there are separate categories of user agents, then they should bedefined here. For instance, a conformance checker would have differentrequirements from a web browser. e.g. A conformance checker must reporterrors to a user, whereas as a web browser isn't required to do so andmay recover gracefully, in the way defined by the specification (whereapplicable).

It should state something like the following to define which sectionsare normative and non-normative.


  All examples and notes in this specification are non-normative, as are
  all sections explicitly marked non-normative. Everything else in this
  specification is normative.


*Defintion of an autodiscovery element*

This should be moved to a separate definitions section (perhaps withinthe previous conformance requirements section). It does not belong inthe Relationship to HTML and XHTML section. The definitions should alsoinclude other terms used throughout the spec, which are then used in theconformance requirements (see the writing specifications article linkedabove).


| An Atom autodiscovery element is a link element, as defined in
| section 12.3 of HTML 4 [W3C.REC-html401-19991224].

Assuming this section is normative, that reference should be normativealso. Throughout the spec, it should also refer to it as just an"autodiscovery element" (see above about it not just being for Atom).

I do not agree that only <link> elements should be used forautodiscovery. Since visible meta data is always better than invisiblemeta data, documents should be allowed to use the <a> element as well.


| As with other types of link elements, an autodiscovery element MAY
| appear within the <head> element of an HTML or XHTML document,

Why is that requirement only stated as a *MAY*? It should be a *MUST*requirement and it should be made clear that this is a documentconformance requirement only.


| but it MUST NOT appear within the <body>.

For document conformance, I agree. But, UA conformance requirementsalso need to be defined. What must a UA do if it finds a link elementin the body? This error is actually far more common than you may think.

As part of a study of several billion pages done by Ian Hickson inSeptember this year, it was found that "Parse error: link element starttag out of place." was the 32nd most common error and happened for about1 in 8 documents, on average. That means in roughly 12.5% of pages, thelink element occurred in the body.

The study was similar to the Web Authoring Statistics [3] published byGoogle in January (also done by Ian Hickson), but with a significantlylarger sample and much more data collected.

In HTML5 (which is based upon the way several browsers have alreadyimplemented HTML), regardless of where the link tag occurs in theserialisation, each link element is still inserted into the head. So,strictly speaking, it is impossible for a link element to appear in thebody in HTML, though the tag could appear anywhere the author put it.

For example, if that were not the case and the link was not insertedinto the head regardless of where it occurred, consider the followingtest case. Is the link considered to be in the head or not?


<head>
<title>Autodiscovery</title>
<script type="text/javascript">
document.write("<p>test<\/p>");
</script>
<link rel="alternate" type="application/atom+xml" href="/feed" />
</head>

The answer actually depends upon whether or not script is enabled. Ifit's disabled, then the answer is yes. Otherwise, when the p elementwritten to the serialisation, it implies the end of the head and thebeginning of the body, and so the answer is no. That is actually thebehaviour of IE7, but not Firefox, Opera or Safari.



*Relationship to HTML and XHTML*

*Syntax rules inherited from HTML*

This significantly limited, informative list of syntax requirements, ifincluded, should be non-normative. Instead, the spec should normativelyrefer to an HTML spec which clearly defines the syntax and parsingrequirements for conforming user agents.

However, if HTML 4.01 is chosen, then as far as the SGML parsing isconcerned, HTML 4.01 cannot be implemented in the real world. No webbrowser does and I seriously doubt any existing tools that implementautodiscovery do so either.

I would strongly recommend that you normatively reference HTML5 for theparsing requirements, which is far more relevant than HTML 4.01 is.However, even though the relevant parts of the parsing section arerelatively stable, the spec itself is not, which is a problem becausebecause it's not usually a good idea to normatively reference a movingtarget.

To do so would make HTML5 a dependency, but it's difficult to progressany specification that has such unstable dependencies. In other words,this would technically be held up by the progress of HTML5 anyway. So,therefore, it doesn't really make sense for this to defined separatelyfrom HTML, particularly when it actually is an HTML feature itself.



*Syntax rules inherited from XHTML*

Again, this significantly limited, informative list of syntaxrequirements, if included, should be non-normative. Instead, the specshould normatively reference the XML 1.0 and XHTML 1.0 specs whichclearly define the syntax and parsing requirements.



*The rel attribute*

| The rel attribute MUST be present in an Atom autodiscovery element.
| As defined in section 6.12 of HTML 4 [W3C.REC-html401-19991224], the
| value of the rel attribute is a space-separated list of keywords.

That's another example of a normative reference to HTML4. In this case,it's ok to reference HTML4 for the definition of the rel attribute, butHTML5's definition would be better.


| The list of keywords MUST include the keyword "alternate" in
| uppercase, lowercase, or mixed case.

That's a reasonable example of a document conformance requirement,though I'd suggest it be rephrased:


  The list of keywords MUST include the keyword "alternate".  The value
  is case-insensitive.

It is also missing user agent conformance requirements, but they wouldbe covered by a normative reference to the HTML spec that defines how toprocess the rel attribute. There's an edge case that should be covered.e.g.


<link rel="alternate stylesheet" type="application/atom+xml" href="/feed">

In HTML, the combination of alternate and stylesheet has specialmeaning. Yet the type attribute still has the atom MIME type. Doesthat still represent an autodiscovery link? If so, that should bedefined and it is currently an interoperability issue. (Note: Thislooks like it's also a problem with the HTML5 spec at the moment)

I believe the feed value should also be specified in this section, as itis in the WHATWG spec. Primarily because a syndication feed isn'tnecessarily an alternative representation of the page, as clearlydemonstrated in the mozilla.org example shown in an earlier post today.


*The type attribute*

The definition of this should also include the value"application/rss+xml". This is because of the above reasons about itnot just being for Atom and because UAs already have to support it as well.


*The href attribute*

The defintion of this is very good. It contains both document and useragent conformance requirements. No further comments.


*Multiple autodiscovery elements*

| * Each autodiscovery element SHOULD point to a different Atom feed.

What must a UA do if multiple links point to the same feed?

| * Each autodiscovery element SHOULD include a title attribute that
|   gives a human-readable label for the feed that the element points
|   to.  Clients MAY use these titles to present a list of available
|   Atom feeds to the end user.

That "MAY" in the last sentence should be at least a "SHOULD", butprobably not a "MUST" in case a UA has a good reason not to show it.


| * The order of the autodiscovery elements is significant.  The first
|   element SHOULD point to the publisher's preferred feed for the
|   document.
| * Clients who present a list of autodiscovered feeds to the end user
|   SHOULD present them in the same order as the autodiscovery
|   elements appear in the document.
| * Clients who wish to choose exactly one feed without user input
|   SHOULD choose the one pointed to by the first autodiscovery
|   element.

This section is quite good, but there are a few issues. What if thefirst one is an unsupported format? e.g. the first is RSS, the secondis Atom, and the UA only supports Atom. Since it's a SHOULD, UAs cantechnically do that, but this stuff should be explicitly defined.

What if the link elements has an hreflang attributes, indicatingalternate languages for the feeds. Say the first is "en", the second"fr", and ther user has configured "fr" as their preferred language.UAs should choose the first one provided in the user's preferred language.


*Examples*

I left comments about the examples section before [4]. Those commentsstill stand.

[1]http://www.ietf.org/internet-drafts/draft-snell-atompub-autodiscovery-00.txt

[2] http://www.rssboard.org/rss-autodiscovery
[3] http://code.google.com/webstats/
[4] http://www.imc.org/atom-syntax/mail-archive/msg19103.html

--
Lachlan Hunt
http://lachy.id.au/

Autodiscovery Draft Issues

Reply via email to