[whatwg] Microdata feedback

2010-01-18 Thread Ian Hickson
On Thu, 12 Nov 2009, Philip Jägenstedt wrote:

 I've been playing with the microdata DOM APIs again, continuing the 
 JavaScript experimental implementation 
 http://gitorious.org/microdatajs. It's not small or elegant, but at 
 least some spec issues have come up in the process.
 
 What is the http://www.w3.org/1999/xhtml/microdata# URI?

It provides a way to map microdata property names to URLs in an 
unambiguous way.



 http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items
 
 Otherwise, if one of the other elements in pending is an ancestor 
 element of candidate, and that element is scope, then remove candidate 
 from pending.
 
 Otherwise, if one of the other elements in pending is an ancestor 
 element of candidate, and that element also has scope as its nearest 
 ancestor element with an itemscope attribute specified, then remove 
 candidate from pending.
 
 The intention of these requirements seems to be to eliminate redundant 
 elements in pending, but a comment on the intention of each in the spec 
 would be helpful as it's quite cryptic right now.

Added some brief explanations.



 http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api
 
 itemtype and itemid are both URL attributes and therefore when getting
 itemType and itemId relative URLs should be resolved (even if only absolute
 URLs are valid). Correct?

That was a correct interpretation of the spec, but was only intended to 
be the case for itemid. I've corrected the spec to say that itemType is 
just a regular DOMString with no resolution.


 itemprop and itemref are both unordered set of unique space-separated
 tokens, but in HTMLElement only itemProp is a DOMSettableTokenList while
 itemRef is a DOMString. This doesn't really make sense, so make itemRef a
 DOMSettableTokenList too?

Fixed. That was an oversight.


 From reading the spec it's not obvious (without following cross- 
 references) that itemProp isn't just a plain string. An example using 
 .itemProp.contains(name) or similar would make this more difficult to 
 miss.

Done.



 http://www.whatwg.org/specs/vocabs/current-work/#vcard
 
 Having clickable cross-references in this spec would help a lot when
 reviewing!

I've put them back in the HTML5 spec, which makes this a moot point.


 Grammar: Let value *be* the result of collecting the first vCard 
 subproperty named value in subitem.

Fixed.


 Let n1 be the value of the first property named family-name in subitem, or
 the empty string if there is no such property or the property's value is
 itself an item. Why not use collecting the first vCard subproperty here?
 Not doing so had me trying to find how the two were different, but I couldn't
 find any differences given that the values are later escaped.

Oops. Fixed.


 There's also the issue of how newlines from textContent values are escaped.
 Applying the vCard extraction algorithm to the spec example gives:
 
 BEGIN:VCARD
 PROFILE:VCARD
 VERSION:3.0
 SOURCE:http://foolip.org/microdatajs/demo/vcard.html
 NAME:vCard demo
 FN:Jack Bauer
 PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg
 ORG:Counter-Terrorist Unit;Los Angeles Division
 ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States
 GEO:34.052339;-118.410623
 TEL;TYPE=work:+1 (310)\n  597 3781
 URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer
 URL;VALUE=URI:http://www.jackbauerfacts.com/
 EMAIL:j.ba...@la.ctu.gov.invalid
 TEL;TYPE=cell:+1 (310) 555\n  3781
 NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B
 rian if it's about\n work\, or ask Tony Almeida if\n you're interested in
 the CTU five-a-side football team we're trying\n to get going.
 AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo
 olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma
 ilto:c.obr...@la.ctu.gov.invalid\nfn:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\;
 \nEND:VCARD\n
 AGENT:Tony Almeida
 REV:2008-07-20T21:00:00+0100
 TEL;TYPE=home:01632 960 123
 N:Bauer;Jack;;;
 END:VCARD
 
 TEL and NOTE has line breaks that are just because of how the HTML source is
 formatted. Importing this into Gmail preserves these linebreaks which looks
 quite broken. Unless we expect text fields to contain meaningful formatting,
 perhaps simply collapsing all whitespace into a single space is OK? In the
 best of worlds br would be converted to \n, but I'm not sure if it's worth
 the trouble.

We're screwed either way. If we convert newlines to  , then we lose 
formatting from pre. If we don't convert newlines, we gain spurious 
linebreaks (and spaces). The latter is less destructive, which is why I 
picked it, but it's not ideal, I agree.

I'd like at some point to introduce some sort of semantic textContent 
that handles br, pre, bdo, dir=, img alt, del, space- 
collapsing, and newline elimination, but there hasn't been much enthusiasm 
around the idea, and it's not clear what else 

Re: [whatwg] Microdata feedback

2010-01-18 Thread Jeremy Keith

Hixie wrote:

Finally on vCard, the final part of the extraction algorithm goes to
great trouble to guess what is the family name and what is the given
name. This guess will be broken for transliterated east Asian names
(CJKV that I know of, maybe others too). Just saying. Also, why is it
important to explicitly add N: for organizations?


This is intended to be compatible with Microformats vCard, which has
these weird rules. If you think we should remove them, please at least
first speak to Tantek and see why he thinks.


The fn optimisation pattern isn't intended to catch 100% of cases,  
just the situation Firstname Lastname or Firstname Middlename  
Lastname. So if you just use fn (formatted name) and don't use n  
(name), the name will be extracted/guessed using the optimisation  
pattern.


In cases where the pattern doesn't work (e.g. Anne van Kesteren, or  
east Asian names) you can still explicitly specify the family name and  
given name, over-riding the fn optimisation pattern. If you do this,  
you need to explicitly state this is the name (n) as well as the  
formatted name (fn).


Similarly, for organisations, you don't have to explicitly set n  
(name) if you apply both fn (formatted name) and org (organisation  
name) to a string. This time, the optimisation pattern assumes that  
the fn is the name of the organisation.


Technically, the n property is *always* required but if you use either  
of those two optimisation patterns, the n is inferred from fn.


HTH,

Jeremy

--
Jeremy Keith

a d a c t i o

http://adactio.com/




[whatwg] bidi embedding for block-level elements

2010-01-18 Thread fantasai

On 01/14/2010 12:49 AM, Simon Montagu wrote:

On 01/11/2010 11:35 PM, fantasai wrote:

On 11/26/2009 10:54 PM, Simon Montagu wrote:


I assume your Gecko example is using a very recent version of Gecko,
such as a nightly build or a beta of Firefox 3.6? I fixed this issue
only a few months ago.

The HTML standard does specify what to do in this case, see
http://www.w3.org/TR/REC-html40/struct/dirlang.html#style-bidi:

When a block element that does not have a dir attribute is transformed
to the style of an inline element by a style sheet, the resulting
presentation should be equivalent, in terms of bidirectional formatting,
to the formatting obtained by explicitly adding a dir attribute
(assigned the inherited value) to the transformed element.

In practice, however, since browsers are not consistent, authors will
have to use CSS properties to achieve the expected results.


Does this mean applying unicode-bidi: embed to all block-level
elements?
Because that seems like it fulfill those requirements.


I was thinking in terms of applying unicode-bidi: embed ad hoc
whenever applying display: inline to a specific element, but applying
it wholesale to all block-level elements will also work, of course.


In that case, I suggest the we add it to the sample default style sheet for
HTML 4 in the CSS2.1 appendix, and recommend the HTMLWG add some wording
about block-level elements defining bidi embedding boundaries to the HTML5
spec (and perhaps using CSS's unicode-bidi: embed rule as an example).

~fantasai


Re: [whatwg] about:blank synchronicity

2010-01-18 Thread Boris Zbarsky

On 1/15/10 5:05 AM, Henri Sivonen wrote:

I've located a Mozilla test case that seems to depend on the event loop task 
mapping of data: URL loads 
(http://mxr.mozilla.org/mozilla-central/source/layout/base/tests/chrome/test_bug533845.xul).


Er... it does?  Where?


Does anyone happen to have data on whether the Web already depends on data: 
URLs that don't block the parser loading as a single event loop task?


I don't think the web depends on data: URLs at all, really, so I would 
guess no.


-Boris


Re: [whatwg] about:blank synchronicity

2010-01-18 Thread Boris Zbarsky

On 1/13/10 4:56 PM, Ian Hickson wrote:

The spec currently distinguishes between the initial about:blank load
(creation of a new browsing context), which actually doesn't involve
navigation, and navigating to about:blank.

It seems like simply making the first one synchronous, but making the
latter asynchronous, would satisfy your use case. Would other vendors be
ok with this?


In case it wasn't clear from the relevant Gecko thread, I would 
personally be fine with this.  That said, would initial about:blank 
load only include iframe/ (no src at all), or also iframe src=/ 
or also iframe src=about:blank/?  I suspect it doesn't matter that 
much, actually, but would welcome confirmation.



Would it have other problems? Are there cases other than navigation where
about:blank being synchronous is detectable? (I couldn't find any.)


I'm not sure what you're asking here...

-Boris


Re: [whatwg] Microdata feedback

2010-01-18 Thread Aryeh Gregor
On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote:
 I've made it redirect to the spec.

Could you say that the URL *should* provide human-readable information
about the vocabulary?  We all know the problems with having
centrally-stored machine-readable data about your specs, but
encouraging the URL to provide human-readable info seems helpful.  (If
they aren't supposed to be dereferenced, why use HTTP?)

 Graphs are intended to be supported in v2, using a mechanism

You seem to have left this sentence unfinished.


Re: [whatwg] Microdata feedback

2010-01-18 Thread Julian Reschke

Aryeh Gregor wrote:

On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote:

I've made it redirect to the spec.


Could you say that the URL *should* provide human-readable information
about the vocabulary?  We all know the problems with having
centrally-stored machine-readable data about your specs, but
encouraging the URL to provide human-readable info seems helpful.  (If
they aren't supposed to be dereferenced, why use HTTP?)
...


SHOULD return human-readable information is good, if you also add SHOULD 
NOT automatically dereference.


BR, Julian


Re: [whatwg] about:blank synchronicity

2010-01-18 Thread Ian Hickson
On Mon, 18 Jan 2010, Boris Zbarsky wrote:
 On 1/13/10 4:56 PM, Ian Hickson wrote:
  The spec currently distinguishes between the initial about:blank load
  (creation of a new browsing context), which actually doesn't involve
  navigation, and navigating to about:blank.
  
  It seems like simply making the first one synchronous, but making the
  latter asynchronous, would satisfy your use case. Would other vendors be
  ok with this?
 
 In case it wasn't clear from the relevant Gecko thread, I would personally be
 fine with this.  That said, would initial about:blank load only include
 iframe/ (no src at all), or also iframe src=/ or also iframe
 src=about:blank/?  I suspect it doesn't matter that much, actually, but
 would welcome confirmation.

It would include any browsing context creation, including, e.g. 
window.open(), object pointing to an HTML file before the HTML file is 
loaded, etc.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] about:blank synchronicity

2010-01-18 Thread Boris Zbarsky

On 1/18/10 6:02 PM, Ian Hickson wrote:

In case it wasn't clear from the relevant Gecko thread, I would personally be
fine with this.  That said, would initial about:blank load only include
iframe/  (no src at all), or alsoiframe src=/  or alsoiframe
src=about:blank/?  I suspect it doesn't matter that much, actually, but
would welcome confirmation.


It would include any browsing context creation, including, e.g.
window.open(),object  pointing to an HTML file before the HTML file is
loaded, etc.


That wasn't quite my question.

If I have an iframe src=about:blank/ in my source, would there be a 
sync about:blank document creation followed by an about:blank load?  Or 
would the @src value just get ignored if it's about:blank?


-Boris


Re: [whatwg] img copyright attribute

2010-01-18 Thread Ian Hickson
On Sat, 9 Jan 2010, will surgent wrote:

 It would be nice if there was a copyright attribute for the HTML 5 img 
 tag. This would make it easy for users and search engines to filter out 
 images that can not be used for certain purposes.

On Sun, 10 Jan 2010, Jonny Barnes wrote:
 
 Or maybe a license attribute instead, that would include copyrighted 
 work and stuff licensed under some CC or alternative.

On Sat, 9 Jan 2010, Aryeh Gregor wrote:
 
 This is one of the things microdata/RDFa are meant to do.

On Sun, 10 Jan 2010, Philip Jägenstedt wrote:
 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#examples-4

On Sun, 10 Jan 2010, will surgent wrote:

 Hmm I didn't know about that. Thanks!

On Sun, 10 Jan 2010, Dawid Czyzewski wrote:

 And why img only? this would also be good for audio and video.

On Mon, 11 Jan 2010, will surgent wrote:

 That sounds like good idea (about the audio and video elements being
 included as-well). I just thought of it because Google does not allow one to
 specify the copyright or license in an image search as far as I know. having
 a license attribute would make it intuitive for developers to add the
 license the same way title and alt attributes are specified.

On Tue, 12 Jan 2010, timeless wrote:
 
 external metadata on copyright is a disaster. it gets lost immediately.
 
 GIF and friends have supported embedding (c) into images for decades.
 
 As google is fully capable of caching images (and obviously does so), I 
 question how adding a tag to html will solve a problem which is already 
 solved by the native image formats themselves.
 
 For lack of a more useful reference about comment fields, i'll just 
 point to one application which is aware of them (although at the time of 
 the posting it only supported them for certain image types): 
 http://www.group42.com/ts-wi04.htm

Based on the above comments, I haven't changed anything -- the work 
vocabulary pretty much already addresses this use case in HTML, and 
addressing it in other formats is a problem for another working group.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'