[Note: posted to two lists, the Simile post having been quoted on
Taxacom]


Someone wrote:

>>> I wish that wikipedia had a fully exportable database
>>> http://en.wikipedia.org/wiki/Lists_of_films
>>>
>>> For example, being able to export all data of this movie as RDF,
>>> maybe a templating issue at least for the box on the right.
>>> http://en.wikipedia.org/wiki/2046_%28film%29
>>
>> Should be an easy job for a SIMILE like screen scraper.
>>
>> If you start scraping down from the Wikipedia film list, you  should get
>> a fair amount of data.

>Some further ideas along these lines. What about scraping information
>about geograpic places like countries and cities from Wikipedia and
>linking the data to geonames

>The Wikipedia articles about countries and cities all follow
>relatively
>similar structures (for instance http://en.wikipedia.org/wiki/Berlin)
>so
>it should be easy to scrape them.

The sounds like another case where a microformat would be useful; wither
"geo" (for coordinates, ("adr", for postal addresses, or hCard, for
venues and organisations (and people).

        <http://microformats.org/wiki/geo>

        <http://microformats.org/wiki/adr>

        <http://microformats.org/wiki/hcard>

'Geo' is already being used on anther Mediawiki site, 'Wikitravel'; se,
for instance:

        <http://wikitravel.org/en/Birmingham_%28England%29>

where it is used on several entries, including that for the RSPB
Sandwell Valley nature reserve.


>I once read about some pretty sophisticated screen-scraping frameworks
>
>that fill relational databases with data from websites but forgot the
>exact links. Does anybody know?

There are several tools for scraping places, and other details (events,
reviews, Atom feeds) from pages marked with the relevant microformats.

I would again remind Taxacom subscribers (and anyone else interested) of
the "Species" microformat proposal:

        <http://microformats.org/wiki/species>

-- 
Andy Mabbett
                Say "NO!" to compulsory ID Cards:  <http://www.no2id.net/>

                Free Our Data:  <http://www.freeourdata.org.uk>
_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to