On Tue, 20 Jul 2010 21:55:38 +0200, Angelo Gladding <ang...@gladding.name> wrote:

On Tue, Jul 20, 2010 at 3:25 AM, Philip Jägenstedt <phil...@opera.com> wrote:
On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding <ang...@gladding.name>
wrote:

Can an enlightened soul describe in which ways microdata is actually
superior to profiled poshformats?

Microdata should be compared to the class attributes and the various
patterns that microformats use, not any specific vocabulary.

Of course. Let me clarify. A `microformat` is a poshformat that has
undergone a relatively laborious process of research and brainstorming
to capture real world user requirements to make a minimal vocabulary
that can capture ~80% of current usage patterns. Microdata is a set of
rules governing a syntax. Hence my comparison of microdata to
poshformats, which are essentially microformats sans the due
diligence.

Right, designing vocabularies is hard and requires due diligence. That's true no matter what the syntax is.

The main benefit is that parsing becomes well-defined

Ain't that the truth.

and simple.

Or is it? I wonder how different the two sets of supporting algorithms
might look face to face once fully documented and implemented.

The Microformats wiki makes the following comparison to microdata:

1. `itemprop` - is a more specific version of class, for field names.
2. `subject` - allows semantically linking within the page.
Conceptually similar to the include-pattern.
3. `itemref` - allows including properties elsewhere on the page that
are not descendants of itemscope. Takes space-separated ids (for
example itemref="address phone" would include the elements with
id="address" and id="phone"). Conceptually similar to the
include-pattern.
4. `content` - on the meta element can be used to include invisible
data that is not part of the content. As current browsers move meta
inside <head>, make sure to include via `itemref`. Conceptually
similar to the 'value-title' feature of the value-class-pattern.
5. `itemscope` - identifies blocks to be marked as structured data.
Conceptually similar to the mfo brainstorming.
6. `itemtype` - to specify the type for an item (for example:
itemtype="http://microformats.org/profile/hcard";).

What wiki page is this from? subject has been replaced by itemid. I can't understand what the similary with the include-pattern could possibly be, though.

Distilled down:

1. @class
2/3. include-pattern/table-header-pattern
4. value-class-pattern
5. "mfo"
6. rel-profile

Sounds to me like the same sort of desire for absolute normativity
that [non-HTML5] XHTML once attempted to burden the entirety of
humanity with. Ironically, HTML5 has deprecated such a style in favor
of a seemingly more flexible Microformat-esque syntax.

Putting XHTML2 aside, one of the main achievements of HTML5 is having formalized how to parse all the sloppy, broken HTML out there (a.k.a. "tag soup"). While the syntax is flexible to authors, there's no flexibility whatsoever for an implementor how to parse it. The result will always be the same. In my view, microdata is to microformats what the HTML5 parser is to HTML4. It makes it possible to parse, without ever guessing, all the microdata items on a page. While it's really easy to write a microformat parser in JavaScript, you're not going to see that built into a browser, where each vocabulary needs a new parser. Microdata also hasn't been implemented by any browser yet, but I'm pretty sure it's going to happen if it takes off.

<span itemscope itemtype="http://microformats.org/profile/hcard";>

Considering your affiliation with Opera, what might I ask are your
feelings about Operator?

I've heard of it before, it looks like a custom Opera distribution? It has nothing to do with microformats or microdata as far as I can tell.

which really isn't really practical with microformats when the
data is hidden in class attributes together with everything else.

As I alluded to above I see this as a complete non-issue yet you are
most certainly not the first to bring it up. What am I missing?

If a browser is going to support some kind of embedded data vocabularies (like events or contacts), the code for parsing it isn't going to be written in JavaScript using the DOM, it's going to be in C++ or C operating on the internal datastructures of the browser. To support a specific microformat vocabulary, one would have to look through all the classes on all elements to find the "root" element, then speculatively search its children for the other structures of the microformat. Given that the all of the constructs used in microformats are also used for completely different things, so most of the data you inspect isn't actually going to be what you're looking for. Since one has to do this for all documents parsed (and not "on demand" like when finding a particular class using document.getElementsByClassName) my guess is that it's going to be slow. What's worse, you'll have to write more or this complicated, slow code for each vocabulary you want to support.

If the data is put in new attributes like itemprop, the code for parsing it will be simpler and you won't have to write it again for every vocabulary support, you can just reuse your getItems(x) implementation to find all items of type x and go from there.

Now, this is all theoretical since no browser has implemented this yet (I tried a bit on my free time, but had too little). If you don't care about browsers, then of course it doesn't matter. If microformats work for you then keep using them. I'm just saying that there's a better way forward.

Might a "humans first, machines second" CJKV internationalization of
`n` optimization be to analyze the contents of the `fn`'s @lang and
inner text and use either or both to better determine name order?

The main problem with this is that due to lazy copy-pasting, lang="en" is often used even when the language isn't English. Also, in the case of e.g. Facebook, lang="en" would be correct for the page itself, but people's names
aren't in English anyway.

Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743

<html lang=ja>...<div class=vcard>...<a class=fn ... >宮野衆</a>...</div>

宮野 can log in today and, without any cooperation from Facebook, append
a U+200B (zero-width space [1]) to his first name (regardless of the
input taking the form of one or two boxes), and immediately reap the
benefits of such an `n` optimization without negatively affecting UI,
sort order, etc.

[1] http://en.wikipedia.org/wiki/Zero-width_space

I don't speak Japanese, but I think 宮野 is the family name and 衆 is the given name. By not doing anything the 'n' optimization will incorrectly guess that the family name is 宮野衆 and given name unknown. By inserting a zero-width space, it will instead incorrectly guess that 宮野 is the given name and 衆 is the family name. Either way it's incorrect.

The only way to get it right is to ask the user both for the full name,
given name and family name, something I haven't ever seen.

If you haven't seen it, then it isn't even a single way to get it
right -- another
byproduct of Microformats philosophy I believe. However, if optimizations
can yield 80%+ positive results when viewed in aggregate I personally give
 a little bit of magic a big thumbs up.

I guess we're not going by the population of the earth then, since China, Japan, Vietnam and South Korea account for 23.36% of it. (http://en.wikipedia.org/wiki/List_of_countries_by_population)

The most practical solution is to not guess at all, and I don't know
of any negative effects of this.

I just see a tiny hint of dehumanization. ;)

Seriously though, what are the negative effects? I'm betting that the number of people that make good use of having the given name and family name separately in their address book aren't many enough to justify screwing it up for the population of East Asia.

--
Philip Jägenstedt
Core Developer
Opera Software

_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Reply via email to