On 1/24/07, Brian Suda <[EMAIL PROTECTED]> wrote:
On 1/24/07, David Janes <[EMAIL PROTECTED]> wrote:
> Do you (Tantek + all) agree with the following "architecture", or it
> least think it's worth pursuing further:
>
> (a) hCards without additional markup; "url" is used to lookup a URL
> (b) at the URL we can either find:
> (b.i) the authorative hCard; OR
> (b.ii) a pointer to authorative URL with the authorative hCard
> (c) it's easy to find the authorative hCard on the authorative URL
>
> I'm sure we have the technology to to (b.ii), I just don't know if
> anyone has done it. Anyone?
In a similar vein, an hCard spider could find hCards in a page with a
URL. They could then follow that URL to the person's page. Then
inspect for hCards. If none are found, it could simply follow all
rel-me links. Since rel-me is published by the author of the page,
[it is a safe asssumption?] that the subsequent requested pages are
also controled by the author. Then hCards could be looked for on those
pages as well. The problem arrises when multiple hCards are
encountered on a page - which is the authorative hCard? This issue is
not a problem with the spider, but with the mechanism to say "THIS
hCard is the one you want" (you suggested an anchor link #vcard), but
using some hueristics, it might be possible to match the URL of the
ORIGINAL hCard that started this spidering, and any hCards found in
the rel-me crawl. If the URLs match, then you could (with some degree
of certainly) collapse the values into a more robust hCard.
Note that the '#vcard' is from Ryan's website, which I was using as my
working example. I love the rel-me bit, I'm a little less happy with
the "crawl many pages and see if we find something" (if I understand
you correctly) and I wonder if it can be improved
Let me just write out the problem again, based on a real world example
(a) Start Source Page (e.g. http://microformats.org/)
<address class="author vcard">
<a class="url fn" href="http://theryanking.com">Ryan</a>
</address>
(b) URL Page (http://theryanking.com):
... something happens ...
Note that Ryan already has a pointer on this page to his contact page:
<a href="http://theryanking.com/blog/contact/" title="contact">contact</a></li>
(c)
Authorative URL Page (http://theryanking.com/blog/contact/#vcard):
<div class="vcard" id="vcard">
... authorative hCard ...
</div>
So the issue is with the "something happens" bit. Here are a few suggestions
(I) Brian's solution (I think)
- look for "rel-me"
- check each page, matching on FN and/or URL
So we would change Brian's page:
<a href="http://theryanking.com/blog/contact/" title="contact"
rel="me">contact</a></li>
This issue here is FN on the source page may be a shorthand "Ryan" but
on the canonical page it may be
(II) Explicitly mark the authorative link with a unique class
<a href="http://theryanking.com/blog/contact/" rel="me
[hcard-authorative]">contact</a>
[hcard-authorative] is a placeholder, and obviously requires inventing
something new
(III) Modified Brian solution: require explicit ID for hcard
<a href="http://theryanking.com/blog/contact/#vcard" title="contact"
rel="me">contact</a></li>
That is, the spider will only attempt to look at rel-me URIs with a
fragment. The benefit is _explicitness_ and less open-ended work for
the spider. Requiring a fragment may introduce britleness?
Regards, etc...
--
David Janes
Founder, BlogMatrix
http://www.blogmatrix.com
http://blogmatrix.blogmatrix.com
_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss