[whatwg] Microdata DOM API issues

Philip Jägenstedt Wed, 11 Nov 2009 18:22:50 -0800

I've been playing with the microdata DOM APIs again, continuing theJavaScript experimental implementation <http://gitorious.org/microdatajs>.It's not small or elegant, but at least some spec issues have come up inthe process.

What is the http://www.w3.org/1999/xhtml/microdata# URI? Just leftoversfrom earlier revisions to the spec?

Why are the algorithms for extracting RDF gone? All that's left is thebook example with the equivalent Turtle, but it would be nice if it wereactually defined how to extract RDF. The same for the JSON stuff, was thatno good?



http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items

"Otherwise, if one of the other elements in pending is an ancestor elementof candidate, and that element is scope, then remove candidate frompending."

"Otherwise, if one of the other elements in pending is an ancestor elementof candidate, and that element also has scope as its nearest ancestorelement with an itemscope attribute specified, then remove candidate frompending."

The intention of these requirements seems to be to eliminate redundantelements in pending, but a comment on the intention of each in the specwould be helpful as it's quite cryptic right now.



http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api

itemtype and itemid are both URL attributes and therefore when gettingitemType and itemId relative URLs should be resolved (even if onlyabsolute URLs are valid). Correct?itemprop and itemref are both "unordered set of unique space-separatedtokens", but in HTMLElement only itemProp is a DOMSettableTokenList whileitemRef is a DOMString. This doesn't really make sense, so make itemRef aDOMSettableTokenList too? From reading the spec it's not obvious (withoutfollowing cross-references) that itemProp isn't just a plain string. Anexample using .itemProp.contains(name) or similar would make this moredifficult to miss.



http://www.whatwg.org/specs/vocabs/current-work/#vcard

Having clickable cross-references in this spec would help a lot whenreviewing!

Grammar: Let value *be* the result of collecting the first vCardsubproperty named value in subitem.

"Let n1 be the value of the first property named family-name in subitem,or the empty string if there is no such property or the property's valueis itself an item." Why not use "collecting the first vCard subproperty"here? Not doing so had me trying to find how the two were different, but Icouldn't find any differences given that the values are later escaped.

There's also the issue of how newlines from textContent values areescaped. Applying the vCard extraction algorithm to the spec example gives:


BEGIN:VCARD
PROFILE:VCARD
VERSION:3.0
SOURCE:http://foolip.org/microdatajs/demo/vcard.html
NAME:vCard demo
FN:Jack Bauer
PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg
ORG:Counter-Terrorist Unit;Los Angeles Division
ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States
GEO:34.052339;-118.410623
TEL;TYPE=work:+1 (310)\n  597 3781
URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer
URL;VALUE=URI:http://www.jackbauerfacts.com/
EMAIL:j.ba...@la.ctu.gov.invalid
TEL;TYPE=cell:+1 (310) 555\n  3781
NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B
 rian if it's about\n work\, or ask Tony Almeida if\n you're interested in
 the CTU five-a-side football team we're trying\n to get going.
AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo
 olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma
 ilto:c.obr...@la.ctu.gov.invalid\nfn:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\;
 \nEND:VCARD\n
AGENT:Tony Almeida
REV:2008-07-20T21:00:00+0100
TEL;TYPE=home:01632 960 123
N:Bauer;Jack;;;
END:VCARD

TEL and NOTE has line breaks that are just because of how the HTML sourceis formatted. Importing this into Gmail preserves these linebreaks whichlooks quite broken. Unless we expect text fields to contain meaningfulformatting, perhaps simply collapsing all whitespace into a single spaceis OK? In the best of worlds <br> would be converted to \n, but I'm notsure if it's worth the trouble.

Finally on vCard, the final part of the extraction algorithm goes to greattrouble to guess what is the family name and what is the given name. Thisguess will be broken for transliterated east Asian names (CJKV that I knowof, maybe others too). Just saying. Also, why is it important toexplicitly add N:;;;; for organizations?



http://www.whatwg.org/specs/vocabs/current-work/#vevent

"Add an iCalendar line with the type name and the value value to output."

At this point value is undefined.

Given the algorithm for extracting iCal, it seems that dtstart and dtendmust be specified using <time datetime="">, as it's only for time elementsthat the time stamps will be properly formatted (stripping - and :)

There are some errors in the example. I got it working by applying thisdiff:


--- vevent.js.orig      2009-11-11 10:52:37.000000000 +0100
+++ vevent.js   2009-11-11 23:54:15.000000000 +0100
@@ -1,3 +1,3 @@
 function getCalendar(node) {

- while (node && (!node.nodeScope || !node.itemType =='http://microformats.org/profile/hcalendar#vevent'))+ while (node && (!node.itemScope || !node.itemType =='http://microformats.org/profile/hcalendar#vevent'))

     node = node.parentNode;
@@ -26,3 +26,3 @@
       value = value.replace(/;/g, '\\;');
-      value = value.replace(/,/g, \\,');
+      value = value.replace(/,/g, '\\,');
       value = value.replace(/\n/g, '\\n');
@@ -31,3 +31,3 @@
       var name = prop.itemProp[nameIndex];
-      if (!name.match(':') && !name.match('.'))
+      if (!name.match(':') && !name.match('\\.'))

calendar += name.toUpperCase() + parameters + ':' + value +'\r\n';


Perhaps /\./ would be better to make it clear that it's a regexp.

Also: if (prop.date && prop.time)

date and time aren't properties on HTMLTimeElement, I don't know what thisis. Is there or should there be a DOM API for determining if a string is avalid date string other than implementing those algorithms in script?



http://www.whatwg.org/specs/vocabs/current-work/#licensing-works

What's the n in http://n.whatwg.org/work? If this URL is going to stick,it would be nice if there were also something to be seen at that page.

Also, the conversion to RDF section isn't really useful and seems to hidesome assumptions about how the properties vocabulary should be prefixedwith http://n.whatwg.org/work and how thehttp://www.w3.org/1999/xhtml/microdata# prefix is supposed to be used.



http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#domtokenlist

The DOM intro box doesn't explain the return value for .toggle(), you haveto consult the algorithm to figure it out.



I'm sure there will be more issues, but that's it for now.

--
Philip Jägenstedt

[whatwg] Microdata DOM API issues

Reply via email to