Re: [uf-discuss] re: HTML5 support

Philip Jägenstedt Wed, 21 Jul 2010 02:50:39 -0700

On Tue, 20 Jul 2010 21:55:38 +0200, Angelo Gladding <ang...@gladding.name>wrote:

On Tue, Jul 20, 2010 at 3:25 AM, Philip Jägenstedt <phil...@opera.com>wrote:

On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding<ang...@gladding.name>

wrote:

Can an enlightened soul describe in which ways microdata is actually
superior to profiled poshformats?


Microdata should be compared to the class attributes and the various
patterns that microformats use, not any specific vocabulary.


Of course. Let me clarify. A `microformat` is a poshformat that has
undergone a relatively laborious process of research and brainstorming
to capture real world user requirements to make a minimal vocabulary
that can capture ~80% of current usage patterns. Microdata is a set of
rules governing a syntax. Hence my comparison of microdata to
poshformats, which are essentially microformats sans the due
diligence.

Right, designing vocabularies is hard and requires due diligence. That'strue no matter what the syntax is.

The main benefit is that parsing becomes well-defined


Ain't that the truth.

and simple.


Or is it? I wonder how different the two sets of supporting algorithms
might look face to face once fully documented and implemented.

The Microformats wiki makes the following comparison to microdata:

1. `itemprop` - is a more specific version of class, for field names.
2. `subject` - allows semantically linking within the page.
Conceptually similar to the include-pattern.
3. `itemref` - allows including properties elsewhere on the page that
are not descendants of itemscope. Takes space-separated ids (for
example itemref="address phone" would include the elements with
id="address" and id="phone"). Conceptually similar to the
include-pattern.
4. `content` - on the meta element can be used to include invisible
data that is not part of the content. As current browsers move meta
inside <head>, make sure to include via `itemref`. Conceptually
similar to the 'value-title' feature of the value-class-pattern.
5. `itemscope` - identifies blocks to be marked as structured data.
Conceptually similar to the mfo brainstorming.
6. `itemtype` - to specify the type for an item (for example:
itemtype="http://microformats.org/profile/hcard";).

What wiki page is this from? subject has been replaced by itemid. I can'tunderstand what the similary with the include-pattern could possibly be,though.

Distilled down:

1. @class
2/3. include-pattern/table-header-pattern
4. value-class-pattern
5. "mfo"
6. rel-profile

Sounds to me like the same sort of desire for absolute normativity
that [non-HTML5] XHTML once attempted to burden the entirety of
humanity with. Ironically, HTML5 has deprecated such a style in favor
of a seemingly more flexible Microformat-esque syntax.

Putting XHTML2 aside, one of the main achievements of HTML5 is havingformalized how to parse all the sloppy, broken HTML out there (a.k.a. "tagsoup"). While the syntax is flexible to authors, there's no flexibilitywhatsoever for an implementor how to parse it. The result will always bethe same. In my view, microdata is to microformats what the HTML5 parseris to HTML4. It makes it possible to parse, without ever guessing, all themicrodata items on a page. While it's really easy to write a microformatparser in JavaScript, you're not going to see that built into a browser,where each vocabulary needs a new parser. Microdata also hasn't beenimplemented by any browser yet, but I'm pretty sure it's going to happenif it takes off.

<span itemscope itemtype="http://microformats.org/profile/hcard";>

Considering your affiliation with Opera, what might I ask are your
feelings about Operator?

I've heard of it before, it looks like a custom Opera distribution? It hasnothing to do with microformats or microdata as far as I can tell.

which really isn't really practical with microformats when the
data is hidden in class attributes together with everything else.


As I alluded to above I see this as a complete non-issue yet you are
most certainly not the first to bring it up. What am I missing?

If a browser is going to support some kind of embedded data vocabularies(like events or contacts), the code for parsing it isn't going to bewritten in JavaScript using the DOM, it's going to be in C++ or Coperating on the internal datastructures of the browser. To support aspecific microformat vocabulary, one would have to look through all theclasses on all elements to find the "root" element, then speculativelysearch its children for the other structures of the microformat. Giventhat the all of the constructs used in microformats are also used forcompletely different things, so most of the data you inspect isn'tactually going to be what you're looking for. Since one has to do this forall documents parsed (and not "on demand" like when finding a particularclass using document.getElementsByClassName) my guess is that it's goingto be slow. What's worse, you'll have to write more or this complicated,slow code for each vocabulary you want to support.

If the data is put in new attributes like itemprop, the code for parsingit will be simpler and you won't have to write it again for everyvocabulary support, you can just reuse your getItems(x) implementation tofind all items of type x and go from there.

Now, this is all theoretical since no browser has implemented this yet (Itried a bit on my free time, but had too little). If you don't care aboutbrowsers, then of course it doesn't matter. If microformats work for youthen keep using them. I'm just saying that there's a better way forward.

Might a "humans first, machines second" CJKV internationalization of
`n` optimization be to analyze the contents of the `fn`'s @lang and
inner text and use either or both to better determine name order?
The main problem with this is that due to lazy copy-pasting, lang="en"isoften used even when the language isn't English. Also, in the case ofe.g.Facebook, lang="en" would be correct for the page itself, but people'snames
aren't in English anyway.


Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743

<html lang=ja>...<div class=vcard>...<a class=fn ... >宮野衆</a>...</div>

宮野 can log in today and, without any cooperation from Facebook, append
a U+200B (zero-width space [1]) to his first name (regardless of the
input taking the form of one or two boxes), and immediately reap the
benefits of such an `n` optimization without negatively affecting UI,
sort order, etc.

[1] http://en.wikipedia.org/wiki/Zero-width_space

I don't speak Japanese, but I think 宮野 is the family name and 衆 is thegiven name. By not doing anything the 'n' optimization will incorrectlyguess that the family name is 宮野衆 and given name unknown. By insertinga zero-width space, it will instead incorrectly guess that 宮野 is thegiven name and 衆 is the family name. Either way it's incorrect.

The only way to get it right is to ask the user both for the full name,
given name and family name, something I haven't ever seen.


If you haven't seen it, then it isn't even a single way to get it
right -- another
byproduct of Microformats philosophy I believe. However, if optimizations

can yield 80%+ positive results when viewed in aggregate I personallygive

 a little bit of magic a big thumbs up.

I guess we're not going by the population of the earth then, since China,Japan, Vietnam and South Korea account for 23.36% of it.(http://en.wikipedia.org/wiki/List_of_countries_by_population)

The most practical solution is to not guess at all, and I don't know
of any negative effects of this.


I just see a tiny hint of dehumanization. ;)

Seriously though, what are the negative effects? I'm betting that thenumber of people that make good use of having the given name and familyname separately in their address book aren't many enough to justifyscrewing it up for the population of East Asia.


--
Philip Jägenstedt
Core Developer
Opera Software

_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] re: HTML5 support

Reply via email to