I've only just recently figured out what xmdp's are, and what they're capable of. I notice the structure of your hcard/hreview/hcalendar thing is such that you can pick an attribute, and search for bits of data which contain that attribute. Have you attempted to detect and parse XMDP's not for validation, but to discover new microformats not documented at microformats.org? Now don't get me wrong, I understand the importance of strong standardization, but say someone creates a niche format for their own site, and related sites in their community with an xmdp, do you think your aggregator could use that xmdp to create new searchable attributes in your search engine?

Sorry this is a bit of a tangent, but the idea of it kind of fascinated me.

On Mar 24, 2006, at 5:40 PM, Scott Reynen wrote:

On Mar 24, 2006, at 4:20 PM, Ryan King wrote:

Hmm, this sounds to me like a theoretical argument. I'd like to hear what experience people have had here. Has anyone here worked on crawling to index microformats? If so, what challenges did you face?

Yes.  The two I know of are reevoo, which aggregates hreviews:

http://www.reevoo.com/

and my own effort, which aggregates hcards, hcalendars, and hreviews:

http://randomchaos.com/microformats/base/

My main challenges have been a lack of space to store the data (which has nothing to do with microformats) and the the lack of a parser that can read invalid X(HT)ML (which is only an issue because I haven't installed Tidy on my server). If microformat site maps existed, I would use them as starting points to know where to look, but I wouldn't trust them as any sort of accurate listing of what's on a domain just because I know I would likely forget to update my own if I had one. So I'd still be reading the same number of documents, just in a different order.

Peace,
Scott
_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss

_______________________________________________
microformats-discuss mailing list
[email protected]
http://microformats.org/mailman/listinfo/microformats-discuss

Reply via email to