Hi, I've recieved a mail a couple of weeks ago from some users of the German DBpedia a few weeks ago who where reporting that they weren't getting any results when querying the endpoint for URIs that contained German umlauts(or any other utf8 characters). I reported the issue to the Jena mailing list and they fixed it, but in the process we also discovered a bug with Virtuoso.
There is a problem with the IRI encoding in the DBpedia Internationalization VAD. Namely when querying the SPARQL endpoint the encoding of the IRIs in RDF/XML is garbled. The issue can be found in both Greek and German endpoints. For example: http://de.dbpedia.org/data/Berlin-Dahlem.rdf , in the first XML lines yo you will notice things linke http://de.dbpedia.org/resource/Königin-Luise-Stiftung instead of http://de.dbpedia.org/resource/Königin-Luise-Stiftung or http://de.dbpedia.org/resource/Gernot_Michael_Müller instead of http://de.dbpedia.org/resource/Gernot_Michael_Müller. You will notice simmilar issues if you look at this resource from the Greek DBpedia: http://el.dbpedia.org/data/Αλέξανδρος_ο_Μέγας.rdf . This problems is that when querying the Internationalization Endpoints not only with Jena but with any other SPARQL client, the user is going to getting garbled IRIs if they contain UTF8 characters. Kind Regards, Alexandru Todor ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
