Hi,

I've recieved a mail a couple of weeks ago from some users of the German 
DBpedia a few weeks ago who where reporting that they weren't getting 
any results when querying the endpoint for URIs that contained German 
umlauts(or any other utf8 characters). I reported the issue to the Jena 
mailing list and they fixed it, but in the process we also discovered a 
bug with Virtuoso.

There is a problem with the IRI encoding in the DBpedia 
Internationalization VAD. Namely when querying the SPARQL endpoint the 
encoding of the IRIs in RDF/XML is garbled. The issue can be found in 
both Greek and German endpoints.

For example: http://de.dbpedia.org/data/Berlin-Dahlem.rdf , in the first 
XML lines yo you will notice things linke 
http://de.dbpedia.org/resource/Königin-Luise-Stiftung instead of 
http://de.dbpedia.org/resource/Königin-Luise-Stiftung or 
http://de.dbpedia.org/resource/Gernot_Michael_Müller instead of 
http://de.dbpedia.org/resource/Gernot_Michael_Müller. You will notice 
simmilar issues if you look at this resource from the Greek DBpedia: 
http://el.dbpedia.org/data/Αλέξανδρος_ο_Μέγας.rdf .

This problems is that when querying the Internationalization Endpoints 
not only with Jena but with any other SPARQL client, the user is going 
to getting garbled IRIs if they contain UTF8 characters.


Kind Regards,
Alexandru Todor


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to