In the last few days I've been trying to create a set of shapes of
administrative divisions in the world; at the very least I'd like to go
one subdivision below country; the immediate purpose is to use the
shapes to segment out coordinates of dbpedia topics into countries and
administrative areas to give myself some ability to geotarget.
Anyway, doing that, I ran into one of those amusing anomalies in
how wikipedia/dbpedia is built.
I found some shapes that appear to be named after three-letter IOC
codes for countries,
http://en.wikipedia.org/wiki/List_of_IOC_country_codes
so Germany is "GER" instead of the iso digraph "DE". No problem,
but this kind of thing that means there's no rest for the wicked.
Anyway, I've got country names, so I'm probably just going to
string match 95% of the names and do the weirdos by hand, but it got me
to thinking that "IOC country code" is a property of a country. These
are represented in that list, but there's the obnoxious thing that
these don't link to the countries, but instead link to pages like
http://en.wikipedia.org/wiki/Ghana_at_the_Olympics
The IOC code is infoboxed, which is good, but there's no reliable
link back to the country.
Most of these pages have a wikilink that points to the actual
country, but often they have links to other countries too, for
instance, Ghana points back to
http://en.wikipedia.org/wiki/New_Zealand
Now, it turns out that Ghana's infoboxes point to some pages that
have some really rich information
http://en.wikipedia.org/wiki/Ghana_at_the_1996_Summer_Olympics
Unfortunately, at the moment, dbpedia only knows these as
wikilinks. Anyway, it's just a good example of what you've got to deal
with when you're extracting data from dbpedia.
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion