On Tue, 20 Jul 2010 21:55:38 +0200, Angelo Gladding <ang...@gladding.name>
wrote:
On Tue, Jul 20, 2010 at 3:25 AM, Philip Jägenstedt <phil...@opera.com>
wrote:
On Tue, 20 Jul 2010 06:05:06 +0200, Angelo Gladding
<ang...@gladding.name>
wrote:
Can an enlightened soul describe in which ways microdata is actually
superior to profiled poshformats?
Microdata should be compared to the class attributes and the various
patterns that microformats use, not any specific vocabulary.
Of course. Let me clarify. A `microformat` is a poshformat that has
undergone a relatively laborious process of research and brainstorming
to capture real world user requirements to make a minimal vocabulary
that can capture ~80% of current usage patterns. Microdata is a set of
rules governing a syntax. Hence my comparison of microdata to
poshformats, which are essentially microformats sans the due
diligence.
Right, designing vocabularies is hard and requires due diligence. That's
true no matter what the syntax is.
The main benefit is that parsing becomes well-defined
Ain't that the truth.
and simple.
Or is it? I wonder how different the two sets of supporting algorithms
might look face to face once fully documented and implemented.
The Microformats wiki makes the following comparison to microdata:
1. `itemprop` - is a more specific version of class, for field names.
2. `subject` - allows semantically linking within the page.
Conceptually similar to the include-pattern.
3. `itemref` - allows including properties elsewhere on the page that
are not descendants of itemscope. Takes space-separated ids (for
example itemref="address phone" would include the elements with
id="address" and id="phone"). Conceptually similar to the
include-pattern.
4. `content` - on the meta element can be used to include invisible
data that is not part of the content. As current browsers move meta
inside <head>, make sure to include via `itemref`. Conceptually
similar to the 'value-title' feature of the value-class-pattern.
5. `itemscope` - identifies blocks to be marked as structured data.
Conceptually similar to the mfo brainstorming.
6. `itemtype` - to specify the type for an item (for example:
itemtype="http://microformats.org/profile/hcard").
What wiki page is this from? subject has been replaced by itemid. I can't
understand what the similary with the include-pattern could possibly be,
though.
Distilled down:
1. @class
2/3. include-pattern/table-header-pattern
4. value-class-pattern
5. "mfo"
6. rel-profile
Sounds to me like the same sort of desire for absolute normativity
that [non-HTML5] XHTML once attempted to burden the entirety of
humanity with. Ironically, HTML5 has deprecated such a style in favor
of a seemingly more flexible Microformat-esque syntax.
Putting XHTML2 aside, one of the main achievements of HTML5 is having
formalized how to parse all the sloppy, broken HTML out there (a.k.a. "tag
soup"). While the syntax is flexible to authors, there's no flexibility
whatsoever for an implementor how to parse it. The result will always be
the same. In my view, microdata is to microformats what the HTML5 parser
is to HTML4. It makes it possible to parse, without ever guessing, all the
microdata items on a page. While it's really easy to write a microformat
parser in JavaScript, you're not going to see that built into a browser,
where each vocabulary needs a new parser. Microdata also hasn't been
implemented by any browser yet, but I'm pretty sure it's going to happen
if it takes off.
<span itemscope itemtype="http://microformats.org/profile/hcard">
Considering your affiliation with Opera, what might I ask are your
feelings about Operator?
I've heard of it before, it looks like a custom Opera distribution? It has
nothing to do with microformats or microdata as far as I can tell.
which really isn't really practical with microformats when the
data is hidden in class attributes together with everything else.
As I alluded to above I see this as a complete non-issue yet you are
most certainly not the first to bring it up. What am I missing?
If a browser is going to support some kind of embedded data vocabularies
(like events or contacts), the code for parsing it isn't going to be
written in JavaScript using the DOM, it's going to be in C++ or C
operating on the internal datastructures of the browser. To support a
specific microformat vocabulary, one would have to look through all the
classes on all elements to find the "root" element, then speculatively
search its children for the other structures of the microformat. Given
that the all of the constructs used in microformats are also used for
completely different things, so most of the data you inspect isn't
actually going to be what you're looking for. Since one has to do this for
all documents parsed (and not "on demand" like when finding a particular
class using document.getElementsByClassName) my guess is that it's going
to be slow. What's worse, you'll have to write more or this complicated,
slow code for each vocabulary you want to support.
If the data is put in new attributes like itemprop, the code for parsing
it will be simpler and you won't have to write it again for every
vocabulary support, you can just reuse your getItems(x) implementation to
find all items of type x and go from there.
Now, this is all theoretical since no browser has implemented this yet (I
tried a bit on my free time, but had too little). If you don't care about
browsers, then of course it doesn't matter. If microformats work for you
then keep using them. I'm just saying that there's a better way forward.
Might a "humans first, machines second" CJKV internationalization of
`n` optimization be to analyze the contents of the `fn`'s @lang and
inner text and use either or both to better determine name order?
The main problem with this is that due to lazy copy-pasting, lang="en"
is
often used even when the language isn't English. Also, in the case of
e.g.
Facebook, lang="en" would be correct for the page itself, but people's
names
aren't in English anyway.
Check out http://ja-jp.facebook.com/people/gong-ye-zhong/100000456401743
<html lang=ja>...<div class=vcard>...<a class=fn ... >宮野衆</a>...</div>
宮野 can log in today and, without any cooperation from Facebook, append
a U+200B (zero-width space [1]) to his first name (regardless of the
input taking the form of one or two boxes), and immediately reap the
benefits of such an `n` optimization without negatively affecting UI,
sort order, etc.
[1] http://en.wikipedia.org/wiki/Zero-width_space
I don't speak Japanese, but I think 宮野 is the family name and 衆 is the
given name. By not doing anything the 'n' optimization will incorrectly
guess that the family name is 宮野衆 and given name unknown. By inserting
a zero-width space, it will instead incorrectly guess that 宮野 is the
given name and 衆 is the family name. Either way it's incorrect.
The only way to get it right is to ask the user both for the full name,
given name and family name, something I haven't ever seen.
If you haven't seen it, then it isn't even a single way to get it
right -- another
byproduct of Microformats philosophy I believe. However, if optimizations
can yield 80%+ positive results when viewed in aggregate I personally
give
a little bit of magic a big thumbs up.
I guess we're not going by the population of the earth then, since China,
Japan, Vietnam and South Korea account for 23.36% of it.
(http://en.wikipedia.org/wiki/List_of_countries_by_population)
The most practical solution is to not guess at all, and I don't know
of any negative effects of this.
I just see a tiny hint of dehumanization. ;)
Seriously though, what are the negative effects? I'm betting that the
number of people that make good use of having the given name and family
name separately in their address book aren't many enough to justify
screwing it up for the population of East Asia.
--
Philip Jägenstedt
Core Developer
Opera Software
_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss