Re: [whatwg] Microdata feedback
On Thu, 08 Dec 2011 22:04:41 +0100, Ian Hickson i...@hixie.ch wrote: I changed the spec as you suggest. Thanks! -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Sat, 9 Jul 2011, Philip Jägenstedt wrote: On Sat, 09 Jul 2011 01:19:02 +0200, Ian Hickson i...@hixie.ch wrote: On Sat, 9 Jul 2011, Philip Jägenstedt wrote: Step 11 is If current has an itemprop attribute specified, add it to results. but should be If current has one or more property names, add it to results. Property names are defined in http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names Why? If you start with div itemprop=foo, then div.itemProp.remove(foo) would give you div itemprop=. It'd be weird if the element still showed up in the properties collection after removing the only property name. The .properties attribute must return an HTMLPropertiesCollection rooted at the Document node, whose filter matches only elements that have property names, which further filters the results of the algorithm. Similarly, everything that uses the algorithm here does things for each property name, so if itemprop= doesn't have any tokens, nothing happens and it doesn't matter that the algorithm returns it. Ah, I see my misunderstanding. Purely editorial: It would, IMO, be more clear if that check were in the algorithm itself. That's the way it's going to be (has been) implemented since there's no reason to do the filtering as a separate step. Do as you wish. I changed the spec as you suggest. I agree that it's cleaner. I checked and I don't think it'll have any negative side-effects, though it does change the precise number of conformance errors in some invalid documents (not a truly practical concern since conformance checkers are only required to report zero errors if there are none and at least one error if there are any). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Microdata feedback
On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote: The JSON algorithm now ends the crawl when it hits a loop, and replaces the offending duplicate item with the string ERROR. The RDF algorithm preserves the loops, since doing so is possible with RDF. Turns out the algorithm almost did this already, looks like it was an oversight. It seems to me that this approach creates an incentive for people who want to do RDFesque things to publish deliberately non-conforming microdata content that works the way they want for RDF-based consumers but breaks for non-RDF consumers. If such content abounds and non-RDF consumers are forced to support loopiness but extending the JSON conversion algorithm in ad hoc ways, part of the benefit of microdata over RDFa (treeness) is destroyed and the benefit of being well-defined would be destroyed, too, for non-RDF consumption cases. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Microdata feedback
On Tue, 12 Jul 2011 09:41:18 +0200, Henri Sivonen hsivo...@iki.fi wrote: On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote: The JSON algorithm now ends the crawl when it hits a loop, and replaces the offending duplicate item with the string ERROR. The RDF algorithm preserves the loops, since doing so is possible with RDF. Turns out the algorithm almost did this already, looks like it was an oversight. It seems to me that this approach creates an incentive for people who want to do RDFesque things to publish deliberately non-conforming microdata content that works the way they want for RDF-based consumers but breaks for non-RDF consumers. If such content abounds and non-RDF consumers are forced to support loopiness but extending the JSON conversion algorithm in ad hoc ways, part of the benefit of microdata over RDFa (treeness) is destroyed and the benefit of being well-defined would be destroyed, too, for non-RDF consumption cases. I don't have a strong opinion, but note that even before this change the algorithm produced a non-tree for the Avenue Q example [1] where the adr property is shared between two items using itemref. (In JSON, it is flattened.) If we want to ensure that RDF consumers don't depend on non-treeness, then this should change as well. [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#examples-4 -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Tue, 12 Jul 2011, Henri Sivonen wrote: On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote: The JSON algorithm now ends the crawl when it hits a loop, and replaces the offending duplicate item with the string ERROR. The RDF algorithm preserves the loops, since doing so is possible with RDF. Turns out the algorithm almost did this already, looks like it was an oversight. It seems to me that this approach creates an incentive for people who want to do RDFesque things to publish deliberately non-conforming microdata content that works the way they want for RDF-based consumers but breaks for non-RDF consumers. If such content abounds and non-RDF consumers are forced to support loopiness but extending the JSON conversion algorithm in ad hoc ways, part of the benefit of microdata over RDFa (treeness) is destroyed and the benefit of being well-defined would be destroyed, too, for non-RDF consumption cases. The problem here is that RDF and microdata have different data models, and RDF cannot represent microdata's data model with fidelity. For example, consider how this converts to RDF and compare it to the microdata equivalent: div itemscope itemtype=http://example.com/; itemid=http://example.com/1; span itemprop=ax/span /div div itemscope itemtype=http://example.com/; itemid=http://example.com/1; span itemprop=bx/span /div There are other things RDF can't represent easily, e.g. it cannot easily represent the order of the values in this item: div itemscope itemtype=http://example.com/; span itemprop=a1/span span itemprop=a2/span /div As such, I suggest we not worry about the itemref= loop case, or that we try to fix all these cases together (not sure how we'd fix them). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Microdata feedback
On Sat, 09 Jul 2011 01:19:02 +0200, Ian Hickson i...@hixie.ch wrote: On Sat, 9 Jul 2011, Philip Jägenstedt wrote: Step 11 is If current has an itemprop attribute specified, add it to results. but should be If current has one or more property names, add it to results. Property names are defined in http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names Why? If you start with div itemprop=foo, then div.itemProp.remove(foo) would give you div itemprop=. It'd be weird if the element still showed up in the properties collection after removing the only property name. The .properties attribute must return an HTMLPropertiesCollection rooted at the Document node, whose filter matches only elements that have property names, which further filters the results of the algorithm. Similarly, everything that uses the algorithm here does things for each property name, so if itemprop= doesn't have any tokens, nothing happens and it doesn't matter that the algorithm returns it. Ah, I see my misunderstanding. Purely editorial: It would, IMO, be more clear if that check were in the algorithm itself. That's the way it's going to be (has been) implemented since there's no reason to do the filtering as a separate step. Do as you wish. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Fri, 08 Jul 2011 00:33:14 +0200, Ian Hickson i...@hixie.ch wrote: On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote: I've been looking into Microdata specification and it struck me, that crawling algorithm is so complex, when it comes to expressing simple ideas. I think that foremost the algorithm should be described in the specification with explanation what it's supposed to do, before steps of what exactly is to be done are written. Yeah. Turns out the algorithms involved here are quite badly broken. It was intended to expose the microdata graph as completely as possible while dropping anything that would introduce a loop, at the point where the first repetition would start (so A-B-C=A would break at the =), in the API, in the JSON, and in the conformance rules. I didn't do a good job speccing that, though! I've fixed the algorithms to make sense (I hope). http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item I had a look at this to verify that it is black-box-equivalent to what Opera has implemented, and only discovered one issue: div itemprop= should not be added to the .properties collection, because it has no properties. My bad for suggesting that the criteria should be the presence of an itemprop attribute, it should be an itemprop attribute containing at least one token. Can you update the spec to match? (I implemented the spec'd algorithm pedantically in https://gitorious.org/microdatajs/microdatajs/commit/217cc34e7e679e2e4ea3e670a0dcdd155a7b9800 for verification, it passes the unit tests with said modification.) On Wed, 29 Jun 2011, Philip Jägenstedt wrote: Note also that other algorithms defined in terms of items and their properties need to handle loopiness in some way. That's currently RDF, vCard and iCal conversion. Perhaps something like loopy item could be defined and those algorithms could skip loopy items wherever they occur? Simply failing is also an acceptable solution, IMO. I fixed vCard with a patch that just outputs AGENT;TYPE=VCARD:ERROR in the case of a loop. (Can only happen if the input is non-conforming, so it doesn't matter if the output is non-conforming.) WFM The vEvent stuff was already loop-safe. The JSON algorithm now ends the crawl when it hits a loop, and replaces the offending duplicate item with the string ERROR. WFM The RDF algorithm preserves the loops, since doing so is possible with RDF. Turns out the algorithm almost did this already, looks like it was an oversight. WFM, but note step 3: Add a mapping from the item item to the subject subject in memory, if there isn't one already. Step 1 guarantees that there is no entry for item, so step 3 can be unconditional. On Wed, 29 Jun 2011, Philip Jägenstedt wrote: Indeed, multiple types doesn't work at all if you want to mix different types. I was assuming that the use case was to extend types, kind of like http://schema.org/Person/Governor. However, it doesn't work all that well even in that case, since there's no way to know which type is the extension of the other and which properties exist only on the extended type. I don't really understand this use case. Can you elaborate on the problem that needs solving here? It's whatever problem http://schema.org/docs/extension.html is trying to solve, which is something like allow people to geek out with more specific vocabularies without interfering with search results. I whined a bit in http://groups.google.com/group/schemaorg-discussion/browse_thread/thread/6de3a1761b115271, the short story being: * extensibility encoded with a microsyntax in the URL, making it not-so-opaque * such URLs make the DOM API less useful Perhaps bending Microdata to accommodate for this is not the best idea. If I were schema.org, I would just encourage people to do this: div itemscope itemtype=http://schema.org/Person; div id=wrapper div itemprop=nameArnold/div div itemscope itemtype=http://example.com/Governor; itemref=wrapper div itemprop=stateCalifornia/div /div /div /div Making extensions unsightly is probably a good thing, to discourage people from going too crazy with it. This way it's also clear which properties only apply to the extended type. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Fri, 08 Jul 2011 21:31:49 +0200, Ian Hickson i...@hixie.ch wrote: On Fri, 8 Jul 2011, Philip Jägenstedt wrote: On Fri, 08 Jul 2011 00:33:14 +0200, Ian Hickson i...@hixie.ch wrote: On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote: I've been looking into Microdata specification and it struck me, that crawling algorithm is so complex, when it comes to expressing simple ideas. I think that foremost the algorithm should be described in the specification with explanation what it's supposed to do, before steps of what exactly is to be done are written. Yeah. Turns out the algorithms involved here are quite badly broken. It was intended to expose the microdata graph as completely as possible while dropping anything that would introduce a loop, at the point where the first repetition would start (so A-B-C=A would break at the =), in the API, in the JSON, and in the conformance rules. I didn't do a good job speccing that, though! I've fixed the algorithms to make sense (I hope). http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item I had a look at this to verify that it is black-box-equivalent to what Opera has implemented, and only discovered one issue: div itemprop= should not be added to the .properties collection, because it has no properties. My bad for suggesting that the criteria should be the presence of an itemprop attribute, it should be an itemprop attribute containing at least one token. Can you update the spec to match? What needs updating? As far as I can tell, what you describe is what the spec requires. Step 11 is If current has an itemprop attribute specified, add it to results. but should be If current has one or more property names, add it to results. Property names are defined in http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names Why? If you start with div itemprop=foo, then div.itemProp.remove(foo) would give you div itemprop=. It'd be weird if the element still showed up in the properties collection after removing the only property name. On Wed, 29 Jun 2011, Philip Jägenstedt wrote: Indeed, multiple types doesn't work at all if you want to mix different types. I was assuming that the use case was to extend types, kind of like http://schema.org/Person/Governor. However, it doesn't work all that well even in that case, since there's no way to know which type is the extension of the other and which properties exist only on the extended type. I don't really understand this use case. Can you elaborate on the problem that needs solving here? It's whatever problem http://schema.org/docs/extension.html is trying to solve, which is something like allow people to geek out with more specific vocabularies without interfering with search results. That doesn't seem to be a problem. I don't really understand what problem this is solving. Neither do I. If the problem is just I want to annotate data that isn't defined in this vocabulary, that's already possible using URL property names. If I were schema.org, I would just encourage people to do this: div itemscope itemtype=http://schema.org/Person; div id=wrapper div itemprop=nameArnold/div div itemscope itemtype=http://example.com/Governor; itemref=wrapper div itemprop=stateCalifornia/div /div /div /div That's a bit weird. Why not just:? div itemscope itemtype=http://schema.org/Person; div itemprop=nameArnold/div div itemprop=http://example.com/Governor/state;California/div /div Yeah, that's better, at least when the number of additional attributes is small. It's hard to know without knowing what concrete user problem we're trying to solve here. I'll leave this discussion to the schema.org sponsors and just hope that the method in http://schema.org/docs/extension.html doesn't catch on. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Mon, 18 Jan 2010 16:24:46 +0100, Jeremy Keith jer...@adactio.com wrote: Hixie wrote: Finally on vCard, the final part of the extraction algorithm goes to great trouble to guess what is the family name and what is the given name. This guess will be broken for transliterated east Asian names (CJKV that I know of, maybe others too). Just saying. Also, why is it important to explicitly add N: for organizations? This is intended to be compatible with Microformats vCard, which has these weird rules. If you think we should remove them, please at least first speak to Tantek and see why he thinks. The fn optimisation pattern isn't intended to catch 100% of cases, just the situation Firstname Lastname or Firstname Middlename Lastname. So if you just use fn (formatted name) and don't use n (name), the name will be extracted/guessed using the optimisation pattern. In cases where the pattern doesn't work (e.g. Anne van Kesteren, or east Asian names) you can still explicitly specify the family name and given name, over-riding the fn optimisation pattern. If you do this, you need to explicitly state this is the name (n) as well as the formatted name (fn). This is going to break badly whenever a template uses vCard microdata and its author either doesn't know the family name and given name (because the data was never collected) or doesn't even consider that the vcard conversion does this funny guesswork. If a social network site or similar does this, then Anne van Kesteren and Zhang Min (fictional name) will have their names messed up with no way of fixing it. At least I haven't seen a site which asks users to both fill in their full name and each component, which is what you need to get this right. Similarly, for organisations, you don't have to explicitly set n (name) if you apply both fn (formatted name) and org (organisation name) to a string. This time, the optimisation pattern assumes that the fn is the name of the organisation. Technically, the n property is *always* required but if you use either of those two optimisation patterns, the n is inferred from fn. If this is just a technical problem with some software requiring N to be present, would it be OK to just output an empty N like for organizations? -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Mon, 18 Jan 2010 13:58:16 +0100, Ian Hickson i...@hixie.ch wrote: I'd like at some point to introduce some sort of semantic textContent that handles br, pre, bdo, dir=, img alt, del, space- collapsing, and newline elimination, but there hasn't been much enthusiasm around the idea, and it's not clear what else it would be good for. I've changed the example, at least, to have it work ok, and added a comment in the example about it. OK. Won't hold my breath for semantic textContent, but it sounds like a good solution. On Thu, 19 Nov 2009, Philip Jägenstedt wrote: In a (slightly edited) Jack Bauer example [1], Chrome, Firefox and presumably Safari has the meta elements moved to head. This will severely break script-based implementation of microdata, which are likely to be used for the time being until the DOM API is implemented natively. I can't see any workaround for this, so I suggest that meta simply not be used for microdata, preferably by making it non-conforming and removing it from the definitions/algorithms. This is a short-term problem that only affects scripted implementations that are shipped with the pages, so the workaround is simple: don't use meta and link. Any implementations outside of the page can just fix their parser to be HTML5-compatible. OK, fair enough. Thanks for all the other fixes, still reviewing the algorithm change... -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Microdata feedback
On Mon, 18 Jan 2010, Aryeh Gregor wrote: On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote: I've made it redirect to the spec. Could you say that the URL *should* provide human-readable information about the vocabulary? We all know the problems with having centrally-stored machine-readable data about your specs, but encouraging the URL to provide human-readable info seems helpful. (If they aren't supposed to be dereferenced, why use HTTP?) Why indeed. Is there something else we could use instead? Graphs are intended to be supported in v2, using a mechanism You seem to have left this sentence unfinished. ...using a mechanism intended for that purpose. Nothing to see here. :-) On Mon, 18 Jan 2010, Julian Reschke wrote: SHOULD return human-readable information is good, if you also add SHOULD NOT automatically dereference. I've added something akin to that SHOULD NOT, but the spec doesn't have a specification conformance class, so there's nothing to apply the SHOULD to. So I haven't added it. (I don't generally think specifications being conformance classes really makes much sense.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Microdata feedback
Hixie wrote: Finally on vCard, the final part of the extraction algorithm goes to great trouble to guess what is the family name and what is the given name. This guess will be broken for transliterated east Asian names (CJKV that I know of, maybe others too). Just saying. Also, why is it important to explicitly add N: for organizations? This is intended to be compatible with Microformats vCard, which has these weird rules. If you think we should remove them, please at least first speak to Tantek and see why he thinks. The fn optimisation pattern isn't intended to catch 100% of cases, just the situation Firstname Lastname or Firstname Middlename Lastname. So if you just use fn (formatted name) and don't use n (name), the name will be extracted/guessed using the optimisation pattern. In cases where the pattern doesn't work (e.g. Anne van Kesteren, or east Asian names) you can still explicitly specify the family name and given name, over-riding the fn optimisation pattern. If you do this, you need to explicitly state this is the name (n) as well as the formatted name (fn). Similarly, for organisations, you don't have to explicitly set n (name) if you apply both fn (formatted name) and org (organisation name) to a string. This time, the optimisation pattern assumes that the fn is the name of the organisation. Technically, the n property is *always* required but if you use either of those two optimisation patterns, the n is inferred from fn. HTH, Jeremy -- Jeremy Keith a d a c t i o http://adactio.com/
Re: [whatwg] Microdata feedback
On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote: I've made it redirect to the spec. Could you say that the URL *should* provide human-readable information about the vocabulary? We all know the problems with having centrally-stored machine-readable data about your specs, but encouraging the URL to provide human-readable info seems helpful. (If they aren't supposed to be dereferenced, why use HTTP?) Graphs are intended to be supported in v2, using a mechanism You seem to have left this sentence unfinished.
Re: [whatwg] Microdata feedback
Aryeh Gregor wrote: On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote: I've made it redirect to the spec. Could you say that the URL *should* provide human-readable information about the vocabulary? We all know the problems with having centrally-stored machine-readable data about your specs, but encouraging the URL to provide human-readable info seems helpful. (If they aren't supposed to be dereferenced, why use HTTP?) ... SHOULD return human-readable information is good, if you also add SHOULD NOT automatically dereference. BR, Julian
Re: [whatwg] Microdata feedback
On Wed, 14 Oct 2009 13:53:46 +0200, Ian Hickson i...@hixie.ch wrote: On Fri, 21 Aug 2009, Philip Jägenstedt wrote: Shouldn't namedItem [6] be namedItems? Code like .namedItem().item(0) would be quite confusing. [6] http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#dom-htmlpropertycollection-nameditem I don't understand what this is referring to. I was incorrectly under the impressions that .namedItem on other collections always returned a single element and arguing that since HTMLPropertyCollection.namedItem always returns a PropertyNodeList namedItems in plural would make more sense. Now I see that some other namedItem methods aren't as simple as I'd thought, so I'm not sure what to make of it. Is there a reason why HTMLPropertyCollection.namedItem unlike some other collections' .namedItem don't return an element if there is only 1 element in the collection at the time the method is called? Perhaps this is legacy quirks that we don't want to replicate? On Tue, 25 Aug 2009, Philip Jägenstedt wrote: There's something like an inverse relationship between simplicity of the syntax and complexity of the resulting markup, the best balance point isn't clear (to me at least). Perhaps option 3 is better, never allowing item+itemprop on the same element. That would preclude being able to make trees. Given that flat items like vcard/vevent are likely to be the most common use case I think we should optimize for that. Child items can be created by using a predefined item property: itemprop=com.example.childtype item. The value of that property would then be the first item in tree-order (or all items in the subtree, not sure). This way, items would have better copy-paste resilience as the whole item element could be made into a top-level item simply by moving it, without meddling with the itemprop. That sounds kinda confusing... More confusing than item+itemprop on the same element? In many cases the property value is the contained text, having it be the contained item node(s) doesn't seem much stranger. Based on the studies Google did, I'm not convinced that people will find the nesting that complicated. IMHO the proposal above is more confusing, too. I'm not sure this is solving a problem that needs solving. If the parent-item (com.example.blog) doesn't know what the child-items are, it would simply use itemprop=item. I don't understand this at all. This was an attempt to have anonymous sub-items. Re-thinking this, perhaps a better solution would be to have each item behave in much the same way that the document itself does. That is, simply add items in the subtree without using itemprop and access them with .getItems(itemType) on the outer item. How would you do things like agent in the vEvent vocabulary? Comparing the current model with a DOM tree, it seems odd in that a property could be an item. It would be like an element attribute being another element: outer foo=inner//. That kind of thing could just as well be outerfooinner//foo/outer, outerinner type=foo//outer or even outerinner//outer if the relationship between the elements is clear just from the fact that they have a parent-child relationship (usually the case). Microdata's datamodel is more similar to JSON's than XML's. It's only in the case where both itemprop and item have a type that an extra level of nesting will be needed and I expect that to be the exception. Changing the model to something more DOM-tree-like is probably going to be easier to understand for many web developers. I dunno. People didn't seem to have much trouble getting it once we used itemscope= rather than just item=. People understand the JSON datamodel pretty well, why would this be different? After http://blog.whatwg.org/usability-testing-html5, the recent syntax changes, the improved DOM API and the passage of time I'm not very worried about the things I was worrying about above. If there's any specific point that seems valid after another review I'll send separate feedback on it. Thanks for all the other fixes! -- Philip Jägenstedt Opera Software