Hi Dimitris,
They haven't fixed it yet since I'm using the latest Open Source version
compiled from source. It is a simple encoding issue that can be worked
around in java with a simple hack: output = new
String(input.getBytes("ISO-8859-1"), "UTF8"); . I don't have much
knowlege about the different character sets, but I think they are
encoding the UTF8 URIs in ISO-8859-1 instead of UTF8. This doesn't
happen with the literal values since the UTF8 characters are escaped as
ASCII.
About the validity, the problem I've noticed is with the Property names,
certain special characters such as brackets have to be filtered in order
for it to be valid XML. I've had this issue with the German DBpedia, and
as you can see right now all resourced produce valid XML and can be
queries trough Sparql clients. If however there is a deeper problem with
IRIs in RDF/XML that I'm unaware of, we should discuss it and push for
another default serialization format for SPARQL.
Regards,
Alexandru
On 10/18/2011 09:29 AM, Dimitris Kontokostas wrote:
> Hi Alexandru,
>
> This is a known issue and we reported it to virtuoso ~9 months ago.
> Unfortunatelly we use debian packages for our installation which
> usually are a little behind from the latest releases, so we can't say
> if it is fixed
>
> But, IRIs cannot be 100% serialized in RDF/XML.
> So even if Virtuoso fixes the encoding, the rdf might still be invalid
>
> Regards,
> Dimitris
>
> On Mon, Oct 17, 2011 at 6:42 PM, Alexandru Todor<[email protected]>
> wrote:
>> Hi,
>>
>> I've recieved a mail a couple of weeks ago from some users of the German
>> DBpedia a few weeks ago who where reporting that they weren't getting
>> any results when querying the endpoint for URIs that contained German
>> umlauts(or any other utf8 characters). I reported the issue to the Jena
>> mailing list and they fixed it, but in the process we also discovered a
>> bug with Virtuoso.
>>
>> There is a problem with the IRI encoding in the DBpedia
>> Internationalization VAD. Namely when querying the SPARQL endpoint the
>> encoding of the IRIs in RDF/XML is garbled. The issue can be found in
>> both Greek and German endpoints.
>>
>> For example: http://de.dbpedia.org/data/Berlin-Dahlem.rdf , in the first
>> XML lines yo you will notice things linke
>> http://de.dbpedia.org/resource/Königin-Luise-Stiftung instead of
>> http://de.dbpedia.org/resource/Königin-Luise-Stiftung or
>> http://de.dbpedia.org/resource/Gernot_Michael_Müller instead of
>> http://de.dbpedia.org/resource/Gernot_Michael_Müller. You will notice
>> simmilar issues if you look at this resource from the Greek DBpedia:
>> http://el.dbpedia.org/data/Αλέξανδρος_ο_Μέγας.rdf .
>>
>> This problems is that when querying the Internationalization Endpoints
>> not only with Jena but with any other SPARQL client, the user is going
>> to getting garbled IRIs if they contain UTF8 characters.
>>
>>
>> Kind Regards,
>> Alexandru Todor
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion