[Note: posted to two lists, the Simile post having been quoted on Taxacom]
Someone wrote: >>> I wish that wikipedia had a fully exportable database >>> http://en.wikipedia.org/wiki/Lists_of_films >>> >>> For example, being able to export all data of this movie as RDF, >>> maybe a templating issue at least for the box on the right. >>> http://en.wikipedia.org/wiki/2046_%28film%29 >> >> Should be an easy job for a SIMILE like screen scraper. >> >> If you start scraping down from the Wikipedia film list, you should get >> a fair amount of data. >Some further ideas along these lines. What about scraping information >about geograpic places like countries and cities from Wikipedia and >linking the data to geonames >The Wikipedia articles about countries and cities all follow >relatively >similar structures (for instance http://en.wikipedia.org/wiki/Berlin) >so >it should be easy to scrape them. The sounds like another case where a microformat would be useful; wither "geo" (for coordinates, ("adr", for postal addresses, or hCard, for venues and organisations (and people). <http://microformats.org/wiki/geo> <http://microformats.org/wiki/adr> <http://microformats.org/wiki/hcard> 'Geo' is already being used on anther Mediawiki site, 'Wikitravel'; se, for instance: <http://wikitravel.org/en/Birmingham_%28England%29> where it is used on several entries, including that for the RSPB Sandwell Valley nature reserve. >I once read about some pretty sophisticated screen-scraping frameworks > >that fill relational databases with data from websites but forgot the >exact links. Does anybody know? There are several tools for scraping places, and other details (events, reviews, Atom feeds) from pages marked with the relevant microformats. I would again remind Taxacom subscribers (and anyone else interested) of the "Species" microformat proposal: <http://microformats.org/wiki/species> -- Andy Mabbett Say "NO!" to compulsory ID Cards: <http://www.no2id.net/> Free Our Data: <http://www.freeourdata.org.uk> _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
