I was working on a freebase <-> dbpedia mapping that doesn't destroy dbpedia, so I had the idea of using the wikipedia page id's from freebase to look up dbpedia resources, and the 'key' to that on the dbpedia side is in the
page_ids_en_nt.bz2 in there I notice a really curious phenomenon, that there's not a 1-1 correspondence between wikipedia page ids and wikipedia pages, for instance: [p...@haruhi apps]$ bzgrep 'wiki/SS>' ~/dbpedia_3.5.1/page_ids_en.nt.bz2 <http://en.wikipedia.org/wiki/SS> <http://dbpedia.org/property/pageId> "27041"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://en.wikipedia.org/wiki/SS> <http://dbpedia.org/property/pageId> "198274"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://en.wikipedia.org/wiki/SS> <http://dbpedia.org/property/pageId> "14524464"^^<http://www.w3.org/2001/XMLSchema#integer> . Anyway, this strikes me as wrong, but I can imagine that something like this might happen if there was a page called 'SS' that got renamed, and then somebody created a new one, and then that got renamed, and so forth. Right now if I look at dbpedia, I see http://dbpedia.org/page/SS in Wikipedia, however, this redirects to http://en.wikipedia.org/wiki/Schutzstaffel looking closely at the dbpedia page for "SS", I think there's some confusion with this rather nicer fellow: http://en.wikipedia.org/wiki/ß <http://en.wikipedia.org/wiki/%C3%9F> and it turns out that dbpedia has much better facts for this entry http://dbpedia.org/page/Schutzstaffel Anyway, I can believe that this has got something to do with the root cause of the general degradation of key integrity that I've seen in dbpedia 3.5. ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
