Hi,

Seems to be a Virtuoso issue with the RDF/XML serializer. Both the Greek and German endpoints seem to have the garbled data in the XML files and this issue only arises when choosing RDF/XML as output. Thanks for the tip, I'll report the issue to the Virtuoso devs.

There's still the problem with QueryExecutionFactory.sparqlService returning no results.

Kind Regards,
Alexandru

On 09/30/2011 05:33 PM, Andy Seaborne wrote:
On 30/09/11 16:17, Alexandru Todor wrote:
Hi,

I maintain the German language DBpedia endpoint, and have gotten some
mails from users complaining that they don't get any results from the
endpoint when they query for resources like:

http://de.dbpedia.org/resource/München

This message and your message are ISO-8859-1

ü = 0xFC in ISO-8859-1 which is the same as a Unicode codepoint and 0xC3 0xBC in UTF-8.

I tried http://de.dbpedia.org/resource/München in my browser and got:

to http://de.dbpedia.org/data/M%C3%BCnchen.xml

which returns:

RDF/XML in UTF-8 but it contains e.g. line 3:

rdf:resource="http://de.dbpedia.org/resource/München";

in Firefox.  That looks corrupt to me.

This is the code they sent me:

String queryString= "SELECT ?o WHERE
{<http://de.dbpedia.org/resource/München>
<http://purl.org/dc/terms/subject> ?o }";
Query query = QueryFactory.create(queryString);
QueryExecution qexec =
QueryExecutionFactory.sparqlService("http://de.dbpedia.org/sparql";, query);
try {
ResultSet results = qexec.execSelect();
for (; results.hasNext();) {
QuerySolution s = results.nextSolution();
System.out.println(s.toString());
}
}
finally {
qexec.close();
}

I tried the code and it works for any IRI that contains no UTF8 chars
(so only for URIs), but when you have UTF8 chars it returns no result.
I've tried a couple of variations and it returns no result but also
doesn't throw any kind of exception, it's just as if the data wasn't there.

Then I proceeded to try an alternative method and used QueryEngineHTTP
to execute the query and it worked. However, QueryEngineHTTP messes up
the UTF8 encoding, so for example in the returned results you get
München instead of München . My guess is that QueryEngineHTTP encodes
the SPARQL results in ISO-8859-1 instead of UTF8, so decoding the
strings as ISO-8859-1 and re-encoding it as UTF8 fixed this.

the code seems to do:

URLEncoder.encode(s, "UTF-8")

but it's still working in strings. Something lower level (Sun networking) does the string to bytes.

    Andy


Kind Regards,
Alexandru Todor

Research Associate
AG Corporate Semantic Web
Freie Universität Berlin







Reply via email to