In the last few days I've been trying to create a set of shapes of 
administrative divisions in the world;  at the very least I'd like to go 
one subdivision below country;  the immediate purpose is to use the 
shapes to segment out coordinates of dbpedia topics into countries and 
administrative areas to give myself some ability to geotarget.

    Anyway,  doing that,  I ran into one of those amusing anomalies in 
how wikipedia/dbpedia is built.

    I found some shapes that appear to be named after three-letter IOC 
codes for countries,

http://en.wikipedia.org/wiki/List_of_IOC_country_codes

    so Germany is "GER" instead of the iso digraph "DE".  No problem,  
but this kind of thing that means there's no rest for the wicked. 

    Anyway,  I've got country names,  so I'm probably just going to 
string match 95% of the names and do the weirdos by hand,  but it got me 
to thinking that "IOC country code" is a property of a country.  These 
are represented in that list,  but there's the obnoxious thing that 
these don't link to the countries,  but instead link to pages like

http://en.wikipedia.org/wiki/Ghana_at_the_Olympics

    The IOC code is infoboxed,  which is good,  but there's no reliable 
link back to the country.

    Most of these pages have a wikilink that points to the actual 
country,  but often they have links to other countries too,  for 
instance,  Ghana points back to

http://en.wikipedia.org/wiki/New_Zealand

    Now,  it turns out that Ghana's infoboxes point to some pages that 
have some really rich information

http://en.wikipedia.org/wiki/Ghana_at_the_1996_Summer_Olympics

    Unfortunately,  at the moment,  dbpedia only knows these as 
wikilinks.  Anyway,  it's just a good example of what you've got to deal 
with when you're extracting data from dbpedia.





------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to