[whatwg] Microdata feedback

2013-08-06 Thread Ian Hickson
On Wed, 13 Feb 2013, Ed Summers wrote:
 
 I am looking for some guidance about the use of multiple itemtypes in 
 microdata [1], specifically the phrase defined to use the same 
 vocabulary in:
 
 
 The item types must all be types defined in applicable specifications
 and must all be defined to use the same vocabulary.
 
 
 For example, does this mean that I can't say:
 
 div itemscope itemtype=http://acme.com/Foo http://zenith.com/Bar; ... 
 /div

It depends on what http://acme.com/Foo and http://zenith.com/Bar are. If 
they use the same vocabulary, then you can do it. If they're separate 
vocabularies, then no.


 The reason I ask is that there is some desire over in the schema.org 
 community [2] to provide a mechanism for schema.org to be specialized. 
 For example, in the case of an audiobook:
 
 div itemscope itemtype=http://schema.org/Book
 http://www.productontology.org/id/Audiobook; ... /div
 
 The idea being not to overload schema.org with more vocabulary, and to 
 let vocabularies grow a bit more organically.

If they're the same vocabulary -- that is, the properties on this .../Book 
vocabulary and this .../Audiobook vocabulary don't clash -- properties 
mean the same thing in both -- then it's fine.


 This schema.org group is currently thinking of using a one off property 
 additionalType that would be used like so:
 
 div itemscope itemtype=http://schema.org/Book;
   link itemprop=additionalType
 href=http://www.productontology.org/id/Audiobook;
   ...
 /div
 
 I personally find this to be kind of distasteful since it replicates the 
 mechanics that microdata's itemtype already offers.

It's essentially equivalent, yes.


 So, my question: is it the case that itemtype cannot reference types in 
 different vocabularies like the example above? If so, I'm curious to 
 know what the rationale was, and if perhaps it could be relaxed.

If they're different vocabularies (i.e. the same terms are used to mean 
different things), then you wouldn't know which was meant, so it would be 
ambiguous. There's an open bug about this topic with an open question:

   https://www.w3.org/Bugs/Public/show_bug.cgi?id=13527


On Thu, 14 Feb 2013, Ed Summers wrote:
 
 In John's email [1] he proposed limiting multiple types to being from 
 the same origin domain, not the same vocabulary as is stated in the 
 Microdata spec. It sounds like an obvious question, but is there a 
 precise definition of what is meant by same vocabulary? Or is it just 
 a hand wavy way of talking about what humans understand when putting the 
 itemtype URLs in their browsers, reading, and understanding that they 
 are types that are part of some larger coherent whole?

Vocabulary means the set of properties that are defined. There's some 
non-normative text in the HTML spec that talks about this:

# The type gives the context for the properties, thus selecting a
# vocabulary: a property named class given for an item with the type
# http://census.example/person; might refer to the economic class of
# an individual, while a property named class given for an item with
# the type http://example.com/school/teacher; might refer to the
# classroom a teacher has been assigned. Several types can share a
# vocabulary. For example, the types
# http://example.org/people/teacher; and
# http://example.org/people/engineer; could be defined to use the
# same vocabulary (though maybe some properties would not be
# especially useful in both cases, e.g. maybe the
# http://example.org/people/engineer; type might not typically be
# used with the classroom property). Multiple types defined to use
# the same vocabulary can be given for a single item by listing the
# URLs as a space-separated list in the attribute' value. An item
# cannot be given two types if they do not use the same vocabulary,
# however.


On Tue, 19 Feb 2013, Judson Lester wrote:

 There was an email from last year suggesting that the values of input 
 elements be derived from their value attributes - the purpose there 
 being to be able to control the form via the microdata interface.  I've 
 only been able to read it in the archives - the brief exchange was 
 between Igor Nikolev and Ian Hickson, who was curious about use cases.
 
 Conversely, it would be useful to be able to use input elements to 
 contain item values, and at the moment, since their values would be 
 derived from their textContent, they're useless for that.  
 Specifically, it's often reasonable to present a representation as the 
 default values in a form and allow for updates simply by posting the 
 changed values.  It seems unwieldy to need to replicate that information 
 in e.g. data elements.
 
 While it would be simple to treat the defaultValue as the item property 
 value for elements (and for radio inputs, let the representation mark 
 the selected input as the itemprop), it seems counter to the spirit of 
 the proposal.  The alternative would be to do something like excluding 
 unsuccessful input elements during 

Re: [whatwg] Microdata feedback

2011-12-09 Thread Philip Jägenstedt

On Thu, 08 Dec 2011 22:04:41 +0100, Ian Hickson i...@hixie.ch wrote:


I changed the spec as you suggest.


Thanks!

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2011-12-08 Thread Ian Hickson
On Sat, 9 Jul 2011, Philip Jägenstedt wrote:
 On Sat, 09 Jul 2011 01:19:02 +0200, Ian Hickson i...@hixie.ch wrote:
  On Sat, 9 Jul 2011, Philip Jägenstedt wrote:
   
   Step 11 is If current has an itemprop attribute specified, add it 
   to results. but should be If current has one or more property 
   names, add it to results. Property names are defined in 
   http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names
   
   Why? If you start with div itemprop=foo, then 
   div.itemProp.remove(foo) would give you div itemprop=. It'd be 
   weird if the element still showed up in the properties collection 
   after removing the only property name.
  
  The .properties attribute must return an HTMLPropertiesCollection 
  rooted at the Document node, whose filter matches only elements that 
  have property names, which further filters the results of the 
  algorithm. Similarly, everything that uses the algorithm here does 
  things for each property name, so if itemprop= doesn't have any 
  tokens, nothing happens and it doesn't matter that the algorithm 
  returns it.
 
 Ah, I see my misunderstanding.
 
 Purely editorial: It would, IMO, be more clear if that check were in the 
 algorithm itself. That's the way it's going to be (has been) implemented 
 since there's no reason to do the filtering as a separate step. Do as 
 you wish.

I changed the spec as you suggest. I agree that it's cleaner. I checked 
and I don't think it'll have any negative side-effects, though it does 
change the precise number of conformance errors in some invalid documents 
(not a truly practical concern since conformance checkers are only 
required to report zero errors if there are none and at least one error if 
there are any).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Microdata feedback

2011-07-12 Thread Henri Sivonen
On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote:
 The JSON algorithm now ends the crawl when it hits a loop, and replaces 
 the offending duplicate item with the string ERROR.
 
 The RDF algorithm preserves the loops, since doing so is possible with 
 RDF. Turns out the algorithm almost did this already, looks like it was an 
 oversight.

It seems to me that this approach creates an incentive for people who
want to do RDFesque things to publish deliberately non-conforming
microdata content that works the way they want for RDF-based consumers
but breaks for non-RDF consumers. If such content abounds and non-RDF
consumers are forced to support loopiness but extending the JSON
conversion algorithm in ad hoc ways, part of the benefit of microdata
over RDFa (treeness) is destroyed and the benefit of being well-defined
would be destroyed, too, for non-RDF consumption cases.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: [whatwg] Microdata feedback

2011-07-12 Thread Philip Jägenstedt

On Tue, 12 Jul 2011 09:41:18 +0200, Henri Sivonen hsivo...@iki.fi wrote:


On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote:

The JSON algorithm now ends the crawl when it hits a loop, and replaces
the offending duplicate item with the string ERROR.

The RDF algorithm preserves the loops, since doing so is possible with
RDF. Turns out the algorithm almost did this already, looks like it was  
an

oversight.


It seems to me that this approach creates an incentive for people who
want to do RDFesque things to publish deliberately non-conforming
microdata content that works the way they want for RDF-based consumers
but breaks for non-RDF consumers. If such content abounds and non-RDF
consumers are forced to support loopiness but extending the JSON
conversion algorithm in ad hoc ways, part of the benefit of microdata
over RDFa (treeness) is destroyed and the benefit of being well-defined
would be destroyed, too, for non-RDF consumption cases.


I don't have a strong opinion, but note that even before this change the  
algorithm produced a non-tree for the Avenue Q example [1] where the  
adr property is shared between two items using itemref. (In JSON, it is  
flattened.) If we want to ensure that RDF consumers don't depend on  
non-treeness, then this should change as well.


[1]  
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#examples-4


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2011-07-12 Thread Ian Hickson
On Tue, 12 Jul 2011, Henri Sivonen wrote:
 On Thu, 2011-07-07 at 22:33 +, Ian Hickson wrote:
  The JSON algorithm now ends the crawl when it hits a loop, and 
  replaces the offending duplicate item with the string ERROR.
  
  The RDF algorithm preserves the loops, since doing so is possible with 
  RDF. Turns out the algorithm almost did this already, looks like it 
  was an oversight.
 
 It seems to me that this approach creates an incentive for people who 
 want to do RDFesque things to publish deliberately non-conforming 
 microdata content that works the way they want for RDF-based consumers 
 but breaks for non-RDF consumers. If such content abounds and non-RDF 
 consumers are forced to support loopiness but extending the JSON 
 conversion algorithm in ad hoc ways, part of the benefit of microdata 
 over RDFa (treeness) is destroyed and the benefit of being well-defined 
 would be destroyed, too, for non-RDF consumption cases.

The problem here is that RDF and microdata have different data models, 
and RDF cannot represent microdata's data model with fidelity.

For example, consider how this converts to RDF and compare it to the 
microdata equivalent:

   div itemscope itemtype=http://example.com/; itemid=http://example.com/1;
span itemprop=ax/span
   /div
   div itemscope itemtype=http://example.com/; itemid=http://example.com/1;
span itemprop=bx/span
   /div

There are other things RDF can't represent easily, e.g. it cannot easily 
represent the order of the values in this item:

   div itemscope itemtype=http://example.com/;
span itemprop=a1/span
span itemprop=a2/span
   /div

As such, I suggest we not worry about the itemref= loop case, or that we 
try to fix all these cases together (not sure how we'd fix them).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Microdata feedback

2011-07-09 Thread Philip Jägenstedt

On Sat, 09 Jul 2011 01:19:02 +0200, Ian Hickson i...@hixie.ch wrote:


On Sat, 9 Jul 2011, Philip Jägenstedt wrote:


Step 11 is If current has an itemprop attribute specified, add it to
results. but should be If current has one or more property names, add
it to results. Property names are defined in
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names

Why? If you start with div itemprop=foo, then
div.itemProp.remove(foo) would give you div itemprop=. It'd be
weird if the element still showed up in the properties collection after
removing the only property name.


The .properties attribute must return an HTMLPropertiesCollection rooted
at the Document node, whose filter matches only elements that have
property names, which further filters the results of the algorithm.
Similarly, everything that uses the algorithm here does things for each
property name, so if itemprop= doesn't have any tokens, nothing  
happens

and it doesn't matter that the algorithm returns it.


Ah, I see my misunderstanding.

Purely editorial: It would, IMO, be more clear if that check were in the  
algorithm itself. That's the way it's going to be (has been) implemented  
since there's no reason to do the filtering as a separate step. Do as you  
wish.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2011-07-08 Thread Philip Jägenstedt

On Fri, 08 Jul 2011 00:33:14 +0200, Ian Hickson i...@hixie.ch wrote:


On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote:


I've been looking into Microdata specification and it struck me, that
crawling algorithm is so complex, when it comes to expressing simple
ideas.  I think that foremost the algorithm should be described in the
specification with explanation what it's supposed to do, before steps of
what exactly is to be done are written.


Yeah. Turns out the algorithms involved here are quite badly broken.

It was intended to expose the microdata graph as completely as possible
while dropping anything that would introduce a loop, at the point where
the first repetition would start (so A-B-C=A would break at the =),
in the API, in the JSON, and in the conformance rules. I didn't do a good
job speccing that, though!

I've fixed the algorithms to make sense (I hope).


http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item

I had a look at this to verify that it is black-box-equivalent to what  
Opera has implemented, and only discovered one issue:


div itemprop= should not be added to the .properties collection,  
because it has no properties. My bad for suggesting that the criteria  
should be the presence of an itemprop attribute, it should be an itemprop  
attribute containing at least one token. Can you update the spec to match?


(I implemented the spec'd algorithm pedantically in  
https://gitorious.org/microdatajs/microdatajs/commit/217cc34e7e679e2e4ea3e670a0dcdd155a7b9800  
for verification, it passes the unit tests with said modification.)





On Wed, 29 Jun 2011, Philip Jägenstedt wrote:


Note also that other algorithms defined in terms of items and their
properties need to handle loopiness in some way. That's currently RDF,
vCard and iCal conversion. Perhaps something like loopy item could be
defined and those algorithms could skip loopy items wherever they occur?
Simply failing is also an acceptable solution, IMO.


I fixed vCard with a patch that just outputs AGENT;TYPE=VCARD:ERROR in
the case of a loop. (Can only happen if the input is non-conforming, so  
it

doesn't matter if the output is non-conforming.)


WFM


The vEvent stuff was already loop-safe.

The JSON algorithm now ends the crawl when it hits a loop, and replaces
the offending duplicate item with the string ERROR.


WFM


The RDF algorithm preserves the loops, since doing so is possible with
RDF. Turns out the algorithm almost did this already, looks like it was  
an

oversight.


WFM, but note step 3: Add a mapping from the item item to the subject  
subject in memory, if there isn't one already. Step 1 guarantees that  
there is no entry for item, so step 3 can be unconditional.





On Wed, 29 Jun 2011, Philip Jägenstedt wrote:


Indeed, multiple types doesn't work at all if you want to mix different
types. I was assuming that the use case was to extend types, kind of
like http://schema.org/Person/Governor. However, it doesn't work all
that well even in that case, since there's no way to know which type is
the extension of the other and which properties exist only on the
extended type.


I don't really understand this use case. Can you elaborate on the problem
that needs solving here?


It's whatever problem http://schema.org/docs/extension.html is trying to  
solve, which is something like allow people to geek out with more  
specific vocabularies without interfering with search results. I whined a  
bit in  
http://groups.google.com/group/schemaorg-discussion/browse_thread/thread/6de3a1761b115271,  
the short story being:


 * extensibility encoded with a microsyntax in the URL, making it  
not-so-opaque

 * such URLs make the DOM API less useful

Perhaps bending Microdata to accommodate for this is not the best idea. If  
I were schema.org, I would just encourage people to do this:


div itemscope itemtype=http://schema.org/Person;
  div id=wrapper
div itemprop=nameArnold/div
div itemscope itemtype=http://example.com/Governor;  
itemref=wrapper

  div itemprop=stateCalifornia/div
/div
  /div
/div

Making extensions unsightly is probably a good thing, to discourage people  
from going too crazy with it. This way it's also clear which properties  
only apply to the extended type.


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2011-07-08 Thread Philip Jägenstedt

On Fri, 08 Jul 2011 21:31:49 +0200, Ian Hickson i...@hixie.ch wrote:


On Fri, 8 Jul 2011, Philip Jägenstedt wrote:

On Fri, 08 Jul 2011 00:33:14 +0200, Ian Hickson i...@hixie.ch wrote:
 On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote:
 
  I've been looking into Microdata specification and it struck me,
  that crawling algorithm is so complex, when it comes to expressing
  simple ideas.  I think that foremost the algorithm should be
  described in the specification with explanation what it's supposed
  to do, before steps of what exactly is to be done are written.

 Yeah. Turns out the algorithms involved here are quite badly broken.

 It was intended to expose the microdata graph as completely as
 possible while dropping anything that would introduce a loop, at the
 point where the first repetition would start (so A-B-C=A would
 break at the =), in the API, in the JSON, and in the conformance
 rules. I didn't do a good job speccing that, though!

 I've fixed the algorithms to make sense (I hope).

http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#the-properties-of-an-item

I had a look at this to verify that it is black-box-equivalent to what
Opera has implemented, and only discovered one issue:

div itemprop= should not be added to the .properties collection,
because it has no properties. My bad for suggesting that the criteria
should be the presence of an itemprop attribute, it should be an
itemprop attribute containing at least one token. Can you update the
spec to match?


What needs updating? As far as I can tell, what you describe is what the
spec requires.


Step 11 is If current has an itemprop attribute specified, add it to  
results. but should be If current has one or more property names, add it  
to results. Property names are defined in  
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#property-names


Why? If you start with div itemprop=foo, then  
div.itemProp.remove(foo) would give you div itemprop=. It'd be weird  
if the element still showed up in the properties collection after removing  
the only property name.





 On Wed, 29 Jun 2011, Philip Jägenstedt wrote:
 
  Indeed, multiple types doesn't work at all if you want to mix
  different types. I was assuming that the use case was to extend
  types, kind of like http://schema.org/Person/Governor. However, it
  doesn't work all that well even in that case, since there's no way
  to know which type is the extension of the other and which
  properties exist only on the extended type.

 I don't really understand this use case. Can you elaborate on the
 problem that needs solving here?

It's whatever problem http://schema.org/docs/extension.html is trying
to solve, which is something like allow people to geek out with more
specific vocabularies without interfering with search results.


That doesn't seem to be a problem. I don't really understand what problem
this is solving.


Neither do I.

If the problem is just I want to annotate data that isn't defined in  
this

vocabulary, that's already possible using URL property names.



If I were schema.org, I would just encourage people to do this:

div itemscope itemtype=http://schema.org/Person;
 div id=wrapper
   div itemprop=nameArnold/div
   div itemscope itemtype=http://example.com/Governor;  
itemref=wrapper

 div itemprop=stateCalifornia/div
   /div
 /div
/div


That's a bit weird. Why not just:?

 div itemscope itemtype=http://schema.org/Person;
  div itemprop=nameArnold/div
  div itemprop=http://example.com/Governor/state;California/div
 /div


Yeah, that's better, at least when the number of additional attributes is  
small.



It's hard to know without knowing what concrete user problem we're trying
to solve here.


I'll leave this discussion to the schema.org sponsors and just hope that  
the method in http://schema.org/docs/extension.html doesn't catch on.


--
Philip Jägenstedt
Core Developer
Opera Software


[whatwg] Microdata feedback

2011-07-07 Thread Ian Hickson
On Wed, 8 Jun 2011, Tomasz Jamroszczak wrote:
 
 I've been looking into Microdata specification and it struck me, that 
 crawling algorithm is so complex, when it comes to expressing simple 
 ideas.  I think that foremost the algorithm should be described in the 
 specification with explanation what it's supposed to do, before steps of 
 what exactly is to be done are written.

Yeah. Turns out the algorithms involved here are quite badly broken.

It was intended to expose the microdata graph as completely as possible 
while dropping anything that would introduce a loop, at the point where 
the first repetition would start (so A-B-C=A would break at the =), 
in the API, in the JSON, and in the conformance rules. I didn't do a good 
job speccing that, though!

I've fixed the algorithms to make sense (I hope).


 Let's see, what are the properties of Microdata item from HTML element 
 with id=up from following HTML:
 
 div itemscope id=up itemprop=prop0
   div itemscope id=down itemprop=prop1 itemref=up/div
 /div

The element id=up has one property, prop1, whose value is an item on the 
element id=down. The element id=down has one property, prop0, whose value 
is the item on the element with id=up. If you crawl from id=up, my intent 
was to have the prop0 be dropped from the graph. If you crawl from 
id=down, my intent was to have prop1 be dropped from the graph. In 
addition, the document is intended to be non-conforming. If you serialise 
it for JSON, my intent was for the item on id=up to be the top one, and 
for it to have one property whose value is the item on id=down, which 
would itself have no values.

Note that the above would be non-conforming on its own because there are 
no top-level microdata items in the above snippet.


 I can imagine good usages of loops of Microdata items, for example John 
 knows Amy, Amy knows John:
 
 div itemscope id=john itemprop
   div itemprop=friends itemref=fred1 jenny2 amy1/div
 /div

 div itemscope id=amy1 itemprop
   div itemprop=friends itemref=john/div
 /div
 
 There's loop:  jonh-amy1-john-... .

itemref= doesn't reference items for property values. It just references 
an element to get a list of properties for an item.

The example above is non-conforming because itemref= can only be 
specified on an itemscope= element, itemprop= is not value without a 
value, and there's no top-level items.

The right way to do what you describe above is (provided the vocabulary 
is defined in a way that supports this):

 div itemscope itemid=http://example.com/john; itemtype=...
   meta itemprop=friends
 content=http://example.com/fred1 http://example.com/jenny2 
http://example.com/amy1;
 /div

 div itemscope itemid=http://example.com/amy1; itemtype=...
   meta itemprop=friends
 content=http://example.com/john;
 /div


 If the loop is to be excluded, and thus recursion, the same data could 
 be written as:
 
 div itemscope
   div itemprop=addressbook_id1/div
   div itemprop=nameJohn/div
   div itemprop=knows2/div
 /div

 div itemscope
   div itemprop=addressbook_id2/div
   div itemprop=nameAmy/div
   div itemprop=knows1/div
 /div.

That's another way to do it, yes.


 maybe with some meta instead of div or more verbosely:
 
 p itemscope itemid=#john id=#johnJohn knows a 
 itemprop=http://xmlns.com/foaf/0.1/knows; href=#amyAmy/a./p

 p itemscope itemid=#amy id=#amyAmy knows a 
 itemprop=http://xmlns.com/foaf/0.1/knows; href=#johnJohn/a./p

That works too.


 The problem I'm addressing revolves around meaning of link between 
 itemref and id attributes.  Is it meant to be a part of Microdata data 
 model?

No, it's just syntactic sugar to allow pages to use microdata without 
having to twist their markup into a pretzel to make it work.


 Or maybe it is introduced to cope with the fact that Microdata graph is 
 defined on top of existing data, which is something completely 
 different, and is meant to be rendered to the user (that is on top of 
 HTML tree)?

Right.


 So the meaning of itemref attribute should also hint interpretation of 
 it inside the specification.

Done.


On Fri, 10 Jun 2011, Philip Jägenstedt wrote:
 
 I don't think the spec needs to be giving suggestions for efficient 
 implementation for live collections, because we inevitable won't 
 implement exactly that algorithm anyway.

The aim wasn't to give suggestions for efficient implementations. The aim 
was to give algorithms for which an efficient implementation existed, 
rather than requiring something nigh on impossible to implement 
efficiently. The aim wasn't reached, though, in that the algorithm in the 
spec was just completely bogus. Sorry about that.


On Tue, 28 Jun 2011, Tomasz Jamroszczak wrote:
 
 For sure itemRef attribute of Microdata have to stay, because it makes 
 possible separation of data (the Microdata item properties, the 
 semantics) and view (where contents of those properties should be laid 
 out for browser user). Without itemRef, Microdata becomes Picodata.


[whatwg] Microdata feedback: please state that property value ordering is in the data model, and give usage guidelines

2011-06-08 Thread Dan Brickley
Hello,

Reading 
http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#microdata

Section '5.2.3 Names: the itemprop attribute' states something
important about Microdata's data model,

Within an item, the properties are unordered with respect to each
other, except for properties with the same name, which are ordered in
the order they are given by the algorithm that defines the properties
of an item.

... and gives an example In the following example, the a property
has the values 1 and 2, in that order,  ...
div itemscope itemref=x
 p itemprop=btest/p
 p itemprop=a2/p
/div
div id=x
 p itemprop=a1/p
/div

However '5.2.1 The microdata model' does not mention anything of this
data model feature. If property values (for some specific
property/item context), this should be mentioned when introducing the
data model; if only by copying or linking the above sentence (Within
an item, ...).

Is the expectation that Microdata vocabulary authors can decide
whether such ordering is meaningful, when they define / describe their
properties?

For example, in academic publishing where they care about being first
named author, the ordering of 'itemprop=author' might seem to
matter. 5.2.3 suggests that the ordering information is at least
preserved in Microdata's data model. If someone creates an 'author'
property for Microdata, should they state that property ordering is
meaningful, or is that not their decision?

Thanks,

Dan


[whatwg] Microdata Feedback: A Server Side implementation of a Microdata Consumer library.

2011-02-11 Thread Emiliano Martinez Luque
Hi everybody, I originally intended to send this message to the
implementors list but seeing in the archives that there hasn't been
much activity there for the last couple of months, I'm sending this to
the general list. Well, basically I just wanted to announce that I've
just released ( http://github.com/emluque/MD_Extract ) a library for
server side Microdata consuming. There are some known issues (
particularly with non-ASCII-extending character encodings, also the
text extraction mechanism from a tree of nodes is very basic, etc. )
but I still felt it was sensible to release it to showcase the
possibilities of the Microdata specification.

I based the implementation on the Algorithm provided by the WhatWG but
there are some variations, the most notable one being that I'm
constructing an intermediate results data structure while traversing
the Html tree rather than storing them in a list and then sorting them
later in tree order as the spec says. I did take Tab's suggestion of
doing a first pass through the Html tree and storing a list of
references to elements with ids ( which was a great suggestion, it
makes the code way clearer and it completely changed the way I was
thinking about the problem ).

To test this:

1. Make sure you have PHP 5 with Tidy (
http://www.php.net/manual/en/tidy.installation.php ) and MB_String (
http://ar.php.net/manual/en/mbstring.installation.php ) support.
2. Download the folder, uncompress it and move it to an apache dir. (
or clone it from github: git clone
https://github.com/emluque/MD_Extract.git )
3. Access the /examples folder with your browser.

Other than that, it reports most common errors ( like an element
marked up with itemscope not having child nodes, or a img element
marked with itemprop and not having an src attribute ). I believe that
apart from the known issues, and thinking just about microdata syntax,
it's 100% compliant with the latest microdata spec (Though there might
be some edge cases I might not be considering).

I'm hoping that it gets tested, this time I made it so that all it
takes (other than having the appropriate configuration of PHP) is
downloading and uncompressing the folder, please do, you will like it.
And please fill any bug reports through the github interface or
through the contact form at my personal page at
http://www.metonymie.com .

Again thank you for a great spec,

-- 
Emiliano Martínez Luque
http://www.metonymie.com


Re: [whatwg] Microdata feedback

2010-01-20 Thread Philip Jägenstedt
On Mon, 18 Jan 2010 16:24:46 +0100, Jeremy Keith jer...@adactio.com  
wrote:



Hixie wrote:

Finally on vCard, the final part of the extraction algorithm goes to
great trouble to guess what is the family name and what is the given
name. This guess will be broken for transliterated east Asian names
(CJKV that I know of, maybe others too). Just saying. Also, why is it
important to explicitly add N: for organizations?


This is intended to be compatible with Microformats vCard, which has
these weird rules. If you think we should remove them, please at least
first speak to Tantek and see why he thinks.


The fn optimisation pattern isn't intended to catch 100% of cases, just  
the situation Firstname Lastname or Firstname Middlename Lastname.  
So if you just use fn (formatted name) and don't use n (name), the name  
will be extracted/guessed using the optimisation pattern.


In cases where the pattern doesn't work (e.g. Anne van Kesteren, or  
east Asian names) you can still explicitly specify the family name and  
given name, over-riding the fn optimisation pattern. If you do this, you  
need to explicitly state this is the name (n) as well as the formatted  
name (fn).


This is going to break badly whenever a template uses vCard microdata and  
its author either doesn't know the family name and given name (because the  
data was never collected) or doesn't even consider that the vcard  
conversion does this funny guesswork. If a social network site or similar  
does this, then Anne van Kesteren and Zhang Min (fictional name) will have  
their names messed up with no way of fixing it. At least I haven't seen a  
site which asks users to both fill in their full name and each component,  
which is what you need to get this right.


Similarly, for organisations, you don't have to explicitly set n (name)  
if you apply both fn (formatted name) and org (organisation name) to a  
string. This time, the optimisation pattern assumes that the fn is the  
name of the organisation.


Technically, the n property is *always* required but if you use either  
of those two optimisation patterns, the n is inferred from fn.


If this is just a technical problem with some software requiring N to be  
present, would it be OK to just output an empty N like for organizations?


--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2010-01-20 Thread Philip Jägenstedt

On Mon, 18 Jan 2010 13:58:16 +0100, Ian Hickson i...@hixie.ch wrote:


I'd like at some point to introduce some sort of semantic textContent
that handles br, pre, bdo, dir=, img alt, del, space-
collapsing, and newline elimination, but there hasn't been much  
enthusiasm

around the idea, and it's not clear what else it would be good for.

I've changed the example, at least, to have it work ok, and added a
comment in the example about it.


OK. Won't hold my breath for semantic textContent, but it sounds like a  
good solution.



On Thu, 19 Nov 2009, Philip Jägenstedt wrote:


In a (slightly edited) Jack Bauer example [1], Chrome, Firefox and
presumably Safari has the meta elements moved to head. This will
severely break script-based implementation of microdata, which are
likely to be used for the time being until the DOM API is implemented
natively. I can't see any workaround for this, so I suggest that meta
simply not be used for microdata, preferably by making it non-conforming
and removing it from the definitions/algorithms.


This is a short-term problem that only affects scripted implementations
that are shipped with the pages, so the workaround is simple: don't use
meta and link. Any implementations outside of the page can just fix
their parser to be HTML5-compatible.


OK, fair enough.

Thanks for all the other fixes, still reviewing the algorithm change...

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Microdata feedback

2010-01-19 Thread Ian Hickson

On Mon, 18 Jan 2010, Aryeh Gregor wrote:
 On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote:
  I've made it redirect to the spec.
 
 Could you say that the URL *should* provide human-readable information 
 about the vocabulary?  We all know the problems with having 
 centrally-stored machine-readable data about your specs, but encouraging 
 the URL to provide human-readable info seems helpful.  (If they aren't 
 supposed to be dereferenced, why use HTTP?)

Why indeed. Is there something else we could use instead?


  Graphs are intended to be supported in v2, using a mechanism
 
 You seem to have left this sentence unfinished.

...using a mechanism intended for that purpose. Nothing to see here. :-)


On Mon, 18 Jan 2010, Julian Reschke wrote:
 
 SHOULD return human-readable information is good, if you also add SHOULD 
 NOT automatically dereference.

I've added something akin to that SHOULD NOT, but the spec doesn't have a 
specification conformance class, so there's nothing to apply the SHOULD 
to. So I haven't added it. (I don't generally think specifications being 
conformance classes really makes much sense.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] Microdata feedback

2010-01-18 Thread Ian Hickson
On Thu, 12 Nov 2009, Philip Jägenstedt wrote:

 I've been playing with the microdata DOM APIs again, continuing the 
 JavaScript experimental implementation 
 http://gitorious.org/microdatajs. It's not small or elegant, but at 
 least some spec issues have come up in the process.
 
 What is the http://www.w3.org/1999/xhtml/microdata# URI?

It provides a way to map microdata property names to URLs in an 
unambiguous way.



 http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#associating-names-with-items
 
 Otherwise, if one of the other elements in pending is an ancestor 
 element of candidate, and that element is scope, then remove candidate 
 from pending.
 
 Otherwise, if one of the other elements in pending is an ancestor 
 element of candidate, and that element also has scope as its nearest 
 ancestor element with an itemscope attribute specified, then remove 
 candidate from pending.
 
 The intention of these requirements seems to be to eliminate redundant 
 elements in pending, but a comment on the intention of each in the spec 
 would be helpful as it's quite cryptic right now.

Added some brief explanations.



 http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#microdata-dom-api
 
 itemtype and itemid are both URL attributes and therefore when getting
 itemType and itemId relative URLs should be resolved (even if only absolute
 URLs are valid). Correct?

That was a correct interpretation of the spec, but was only intended to 
be the case for itemid. I've corrected the spec to say that itemType is 
just a regular DOMString with no resolution.


 itemprop and itemref are both unordered set of unique space-separated
 tokens, but in HTMLElement only itemProp is a DOMSettableTokenList while
 itemRef is a DOMString. This doesn't really make sense, so make itemRef a
 DOMSettableTokenList too?

Fixed. That was an oversight.


 From reading the spec it's not obvious (without following cross- 
 references) that itemProp isn't just a plain string. An example using 
 .itemProp.contains(name) or similar would make this more difficult to 
 miss.

Done.



 http://www.whatwg.org/specs/vocabs/current-work/#vcard
 
 Having clickable cross-references in this spec would help a lot when
 reviewing!

I've put them back in the HTML5 spec, which makes this a moot point.


 Grammar: Let value *be* the result of collecting the first vCard 
 subproperty named value in subitem.

Fixed.


 Let n1 be the value of the first property named family-name in subitem, or
 the empty string if there is no such property or the property's value is
 itself an item. Why not use collecting the first vCard subproperty here?
 Not doing so had me trying to find how the two were different, but I couldn't
 find any differences given that the values are later escaped.

Oops. Fixed.


 There's also the issue of how newlines from textContent values are escaped.
 Applying the vCard extraction algorithm to the spec example gives:
 
 BEGIN:VCARD
 PROFILE:VCARD
 VERSION:3.0
 SOURCE:http://foolip.org/microdatajs/demo/vcard.html
 NAME:vCard demo
 FN:Jack Bauer
 PHOTO;VALUE=URI:http://foolip.org/microdatajs/demo/jack-bauer.jpg
 ORG:Counter-Terrorist Unit;Los Angeles Division
 ADR:;;10201 W. Pico Blvd.;Los Angeles;CA;90064;United States
 GEO:34.052339;-118.410623
 TEL;TYPE=work:+1 (310)\n  597 3781
 URL;VALUE=URI:http://en.wikipedia.org/wiki/Jack_Bauer
 URL;VALUE=URI:http://www.jackbauerfacts.com/
 EMAIL:j.ba...@la.ctu.gov.invalid
 TEL;TYPE=cell:+1 (310) 555\n  3781
 NOTE:If I'm out in the field\, you may be better off\n contacting Chloe O'B
 rian if it's about\n work\, or ask Tony Almeida if\n you're interested in
 the CTU five-a-side football team we're trying\n to get going.
 AGENT;VALUE=VCARD:BEGIN:VCARD\nPROFILE:VCARD\nVERSION:3.0\nSOURCE:http://fo
 olip.org/microdatajs/demo/vcard.html\nNAME:vCard demo\nEMAIL\;VALUE=URI:ma
 ilto:c.obr...@la.ctu.gov.invalid\nfn:Chloe O'Brian\nN:O'Brian\;Chloe\;\;\;
 \nEND:VCARD\n
 AGENT:Tony Almeida
 REV:2008-07-20T21:00:00+0100
 TEL;TYPE=home:01632 960 123
 N:Bauer;Jack;;;
 END:VCARD
 
 TEL and NOTE has line breaks that are just because of how the HTML source is
 formatted. Importing this into Gmail preserves these linebreaks which looks
 quite broken. Unless we expect text fields to contain meaningful formatting,
 perhaps simply collapsing all whitespace into a single space is OK? In the
 best of worlds br would be converted to \n, but I'm not sure if it's worth
 the trouble.

We're screwed either way. If we convert newlines to  , then we lose 
formatting from pre. If we don't convert newlines, we gain spurious 
linebreaks (and spaces). The latter is less destructive, which is why I 
picked it, but it's not ideal, I agree.

I'd like at some point to introduce some sort of semantic textContent 
that handles br, pre, bdo, dir=, img alt, del, space- 
collapsing, and newline elimination, but there hasn't been much enthusiasm 
around the idea, and it's not clear what else 

Re: [whatwg] Microdata feedback

2010-01-18 Thread Jeremy Keith

Hixie wrote:

Finally on vCard, the final part of the extraction algorithm goes to
great trouble to guess what is the family name and what is the given
name. This guess will be broken for transliterated east Asian names
(CJKV that I know of, maybe others too). Just saying. Also, why is it
important to explicitly add N: for organizations?


This is intended to be compatible with Microformats vCard, which has
these weird rules. If you think we should remove them, please at least
first speak to Tantek and see why he thinks.


The fn optimisation pattern isn't intended to catch 100% of cases,  
just the situation Firstname Lastname or Firstname Middlename  
Lastname. So if you just use fn (formatted name) and don't use n  
(name), the name will be extracted/guessed using the optimisation  
pattern.


In cases where the pattern doesn't work (e.g. Anne van Kesteren, or  
east Asian names) you can still explicitly specify the family name and  
given name, over-riding the fn optimisation pattern. If you do this,  
you need to explicitly state this is the name (n) as well as the  
formatted name (fn).


Similarly, for organisations, you don't have to explicitly set n  
(name) if you apply both fn (formatted name) and org (organisation  
name) to a string. This time, the optimisation pattern assumes that  
the fn is the name of the organisation.


Technically, the n property is *always* required but if you use either  
of those two optimisation patterns, the n is inferred from fn.


HTH,

Jeremy

--
Jeremy Keith

a d a c t i o

http://adactio.com/




Re: [whatwg] Microdata feedback

2010-01-18 Thread Aryeh Gregor
On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote:
 I've made it redirect to the spec.

Could you say that the URL *should* provide human-readable information
about the vocabulary?  We all know the problems with having
centrally-stored machine-readable data about your specs, but
encouraging the URL to provide human-readable info seems helpful.  (If
they aren't supposed to be dereferenced, why use HTTP?)

 Graphs are intended to be supported in v2, using a mechanism

You seem to have left this sentence unfinished.


Re: [whatwg] Microdata feedback

2010-01-18 Thread Julian Reschke

Aryeh Gregor wrote:

On Mon, Jan 18, 2010 at 7:58 AM, Ian Hickson i...@hixie.ch wrote:

I've made it redirect to the spec.


Could you say that the URL *should* provide human-readable information
about the vocabulary?  We all know the problems with having
centrally-stored machine-readable data about your specs, but
encouraging the URL to provide human-readable info seems helpful.  (If
they aren't supposed to be dereferenced, why use HTTP?)
...


SHOULD return human-readable information is good, if you also add SHOULD 
NOT automatically dereference.


BR, Julian


Re: [whatwg] Microdata feedback

2009-10-15 Thread Philip Jägenstedt

On Wed, 14 Oct 2009 13:53:46 +0200, Ian Hickson i...@hixie.ch wrote:


On Fri, 21 Aug 2009, Philip Jägenstedt wrote:


Shouldn't namedItem [6] be namedItems? Code like .namedItem().item(0)
would be quite confusing.
[6]  
http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#dom-htmlpropertycollection-nameditem


I don't understand what this is referring to.


I was incorrectly under the impressions that .namedItem on other  
collections always returned a single element and arguing that since  
HTMLPropertyCollection.namedItem always returns a PropertyNodeList  
namedItems in plural would make more sense. Now I see that some other  
namedItem methods aren't as simple as I'd thought, so I'm not sure what to  
make of it. Is there a reason why HTMLPropertyCollection.namedItem unlike  
some other collections' .namedItem don't return an element if there is  
only 1 element in the collection at the time the method is called? Perhaps  
this is legacy quirks that we don't want to replicate?



On Tue, 25 Aug 2009, Philip Jägenstedt wrote:


There's something like an inverse relationship between simplicity of the
syntax and complexity of the resulting markup, the best balance point
isn't clear (to me at least). Perhaps option 3 is better, never allowing
item+itemprop on the same element.


That would preclude being able to make trees.



  Given that flat items like vcard/vevent are likely to be the most
  common use case I think we should optimize for that. Child items can
  be created by using a predefined item property:
  itemprop=com.example.childtype item. The value of that property
  would then be the first item in tree-order (or all items in the
  subtree, not sure). This way, items would have better copy-paste
  resilience as the whole item element could be made into a top-level
  item simply by moving it, without meddling with the itemprop.

 That sounds kinda confusing...

More confusing than item+itemprop on the same element? In many cases the
property value is the contained text, having it be the contained item
node(s) doesn't seem much stranger.


Based on the studies Google did, I'm not convinced that people will find
the nesting that complicated. IMHO the proposal above is more confusing,
too. I'm not sure this is solving a problem that needs solving.



  If the parent-item (com.example.blog) doesn't know what the
  child-items are, it would simply use itemprop=item.

 I don't understand this at all.

This was an attempt to have anonymous sub-items. Re-thinking this,
perhaps a better solution would be to have each item behave in much the
same way that the document itself does. That is, simply add items in the
subtree without using itemprop and access them with .getItems(itemType)
on the outer item.


How would you do things like agent in the vEvent vocabulary?



Comparing the current model with a DOM tree, it seems odd in that a
property could be an item. It would be like an element attribute being
another element: outer foo=inner//. That kind of thing could just
as well be outerfooinner//foo/outer, outerinner
type=foo//outer or even outerinner//outer if the relationship
between the elements is clear just from the fact that they have a
parent-child relationship (usually the case).


Microdata's datamodel is more similar to JSON's than XML's.



It's only in the case where both itemprop and item have a type that an
extra level of nesting will be needed and I expect that to be the
exception. Changing the model to something more DOM-tree-like is
probably going to be easier to understand for many web developers.


I dunno. People didn't seem to have much trouble getting it once we used
itemscope= rather than just item=. People understand the JSON
datamodel pretty well, why would this be different?


After http://blog.whatwg.org/usability-testing-html5, the recent syntax  
changes, the improved DOM API and the passage of time I'm not very worried  
about the things I was worrying about above. If there's any specific point  
that seems valid after another review I'll send separate feedback on it.  
Thanks for all the other fixes!


--
Philip Jägenstedt
Opera Software


[whatwg] Microdata feedback

2009-10-14 Thread Ian Hickson
On Fri, 21 Aug 2009, Philip Jägenstedt wrote:
 
 The spec says that properties can also themselves be groups of 
 name-value pairs, but this isn't exposed in a very convenient way in 
 the DOM API. The 'properties' DOM-property is a HTMLPropertyCollection 
 of all associated elements. Discovering if the item-property value is a 
 plain string or an item seems to require item.hasAttribute('item'), 
 which seems out of place when everything else has been so neatly 
 reflected.

This is now reflected on item.itemScope.


 Also, the 'contents' DOM-property is always the item-property value 
 except in the case where the item-property is another item -- in that 
 case it is something random like .href or .textContent depending on the 
 element type. I think it would be better if the DOM-property were simply 
 called 'value' (the spec does talk about name-value pairs after all) and 
 corresponded more exactly to 'property value' [3]. Elements that have no 
 'property names' [4] should return null and otherwise elements with an 
 'item' attribute should return itself, although I don't think it should 
 be writable in that case. One might also/otherwise consider adding a 
 valueType DOM-property which could be 'string', 'item' or something 
 similar.

Interesting idea. I've renamed 'content' to 'itemValue', and made it 
return null if there's no itemprop=, and the element itself if there's 
an itemscope=.


 One example [5] uses document.items[item].names but document.items isn't 
 defined anywhere. I assume this is an oversight and that it is 
 equivalent to document.getItems() Further, names is a member of 
 HTMLPropertyCollection, so document.items[item].properties.names is 
 probably intended instead of document.items[item].names. Assuming this 
 the example actually produces the output it claims to.

Fixed.


 Shouldn't namedItem [6] be namedItems? Code like .namedItem().item(0) 
 would be quite confusing.
 [6] 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/infrastructure.html#dom-htmlpropertycollection-nameditem

I don't understand what this is referring to.


 Also, RadioNodeList should be PropertyNodeList.

Fixed.


 I think many will wonder why item and itemprop can't be given on a 
 single element for compactness:
 
 span item=org.example.fruit itemprop=org.example.nameApple/spans and
 span item=org.example.fruit itemprop=org.example.nameOrange/spans
 don't compare well.

Modulo the changes to the syntax (s/item=/itemscope itemtype=/g), this is 
allowed -- but it means the same as this:

   span itemprop=org.example.name itemscope itemtype=org.example.fruit...

...which is to say, it's giving a property whose value is itself an item.


On Sun, 23 Aug 2009, Eduard Pascual wrote:
 On Sat, Aug 22, 2009 at 11:51 PM, Ian Hicksoni...@hixie.ch wrote:
 
  Based on some of the feedback on Microdata recently, e.g.:
 
  � http://www.jenitennison.com/blog/node/124
 
  ...and a number of e-mails sent to this list and the W3C lists, I am 
  going to try some tweaks to the Microdata syntax. Google has kindly 
  offered to provide usability testing resources so that we can try a 
  variety of different syntaxes and see which one is easiest for authors 
  to understand.
 
  If anyone has any concrete syntax ideas that they would like me to 
  consider, please let me know. There's a (pretty low) limit to how many 
  syntaxes we can perform usability tests on, though, so I won't be able 
  to test every idea.
 
 This would be more than just tweaking the syntax, but I think 
 appropriate to bring forth my CRDF proposal as a suggestion for an 
 alternative to Microdata.

I considered testing this, as well as RDFa, but due to time constraints we 
ended up only being able to test a few changes, so I concentrated 
specifically on microdata variants.


On Tue, 25 Aug 2009, Philip Jägenstedt wrote:
 
 There's something like an inverse relationship between simplicity of the 
 syntax and complexity of the resulting markup, the best balance point 
 isn't clear (to me at least). Perhaps option 3 is better, never allowing 
 item+itemprop on the same element.

That would preclude being able to make trees.


   Given that flat items like vcard/vevent are likely to be the most 
   common use case I think we should optimize for that. Child items can 
   be created by using a predefined item property: 
   itemprop=com.example.childtype item. The value of that property 
   would then be the first item in tree-order (or all items in the 
   subtree, not sure). This way, items would have better copy-paste 
   resilience as the whole item element could be made into a top-level 
   item simply by moving it, without meddling with the itemprop.
  
  That sounds kinda confusing...
 
 More confusing than item+itemprop on the same element? In many cases the 
 property value is the contained text, having it be the contained item 
 node(s) doesn't seem much stranger.

Based on the studies Google did, I'm not convinced that people will