Hello,

This might be of interest to members of this group, as it deals with extracting data from semantic HTML. Prior to this year's Mashed Museum event at the University of Leicester, Dan Zambonini put together a prototype which aggregates data by spidering online museum catalogues:
http://hoardit.pbwiki.com/
It's a pretty fantastic demo of how information can be extracted from well-structured HTML, even before you think of putting microformats etc. on top.

In particular, it does a pretty good job of figuring out when an object was made:
http://feeds.boxuk.com/museums/object_100yrs.php
The date parser is based on some code Dan & I knocked together at Mashed Museum 2007, which looks at strings like 'late Victorian', 'early 20th Century', '4th January 1853' and so on, and converts them to machine-readable ISO dates.

Our original idea, which we never got round to actually implementing, was that this would be useful as a web service - you give it a string, it gives you a machine-parsable representation of that string. The recent discussion here about dates has made me wonder if such a web service woud be useful for microformats parsers. What do others think?

Cheers
Jim

Jim O'Donnell
[EMAIL PROTECTED]
http://eatyourgreens.org.uk
http://flickr.com/photos/eatyourgreens



_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Reply via email to