Marvin, Kingsley,

On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote:
> yes, it's a bug in our dataset.

Actually, no. It's a bug in Virtuoso's SPARQL+XML result format  
serializer.

Ampersands are allowed in URIs, so the Yago URIs are perfectly fine  
according to all the specs. (We *might* still want to %-encode the  
ampersand in those URIs, but just for consistency with our other URIs,  
not because the specs require it. That's a separate question.)

The problem is: When a "&" character is included in content inside an  
XML file, it has to be written as "&". Virtuoso does not do this,  
hence the breakage.

(This is a silly bug. The need to encode reserved characters (& and ")  
is just about the first thing a developer learns about XML. I hope  
OpenLink fixes this soon. Kingsley?)

Richard



> In particular in the Yago dataset, which
> has been contributed externally and wasn't created with the DBpedia
> framework (but hey, we've got many similar bugs in datasets created by
> our framework ;))
>
> Yago URIs have not been url-encoded. So as a workaround, you can
> url_encode all URIs starting with http://dbpedia.org/class/yago/ in  
> the
> yago_en.nt file before loading it into your Jena model. That should do
> it.
>
> And we'll fix that bug for the future.
>
> Best,
> Georgi
>
> --
> Georgi Kobilarov
> Freie Universität Berlin
> www.georgikobilarov.com
>
>> -----Original Message-----
>> From: [EMAIL PROTECTED]
> [mailto:dbpedia-
>> [EMAIL PROTECTED] On Behalf Of Marvin Lugair
>> Sent: Wednesday, August 20, 2008 12:57 AM
>> To: [email protected]
>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI
>> breakingJena code
>>
>>
>> Hi,
>>
>> The following sparql query:
>> select distinct ?Concept where {[] a ?Concept
>>
>> Is the default query at the dbpedia endpoint http://dbpedia.org/ 
>> sparql
>> It returns several URI's including the following one (notice the and
>> sign):
>>
>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople
>>
>> So DBPedia is returning URI's containing an ampersand. This is  
>> causing
>> an exception in the Jena parser.
>>
>> How do I fix this? None of Jenas methods will work, I cant transofrm
>> the resultset into a model or even print is with the resultformatter.
>> If i iterate over it, I can print the results one by one till I get  
>> to
>> the malformed URI. How do I check in my code for malformed URI's?
>>
>>
>> Any ideas?
>> Thanks!
>> Marv
>> -------------
>>
>> The code below works till i get a URI with an ampersand.
>> The exception is coming from results.nextSolution(). Other Jena
>> methods to convert the retrieved resultset to a model directly or
>> format it produce the same exception (I assume they have a similar
>> iterator inside)
>>
>>
>> QueryExecution qexec =
>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";,
>> "select distinct ?Concept where {[] a ?Concept}");
>>
>> try {
>> ResultSet results = qexec.execSelect();
>> for ( ; results.hasNext() ; )
>> {
>> QuerySolution soln = results.nextSolution() ;
>> String x = soln.get("Concept").toString();
>> System.out.print(x +"\n");
>> }
>> }
>>
>> finally {
>> System.out.println("closing!");
>> qexec.close() ;
>> }
>>
>>
>> This will result in the following error:
>>
>>
>> [com.ctc.wstx.exc.WstxLazyException]
>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character  
>> '<'
>> (code 60); expected a semi-colon after the reference for entity
>> 'MelindaGatesFoundationPeople'
>> at [row,col {unknown-source}]: [2609,96]
>> at
>>
> com 
> .ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4
>> 5)
>> at
>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671)
>> at
>>
> com 
> .ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav
>> a:3505)
>> at
>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804)
>> at
>>
> com 
> .ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java
>> :674)
>> at
>>
> com.hp.hpl.jena.sparql.resultset.XMLInputStAX 
> $ResultSetStAX.getOneSolut
>> ion(XMLIn\
>> putStAX.java:472)
>> at
>>
> com.hp.hpl.jena.sparql.resultset.XMLInputStAX 
> $ResultSetStAX.hasNext(XML
>> InputStAX\
>> .java:213)
>>
>>
>>
>> I also posted this on the Jena group but some seem to suggest it is a
>> dbpedia issue: http://tech.groups.yahoo.com/group/jena-
>> dev/message/36210
>>
>>
>>
>>
>>
>>
> -----------------------------------------------------------------------
>> --
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win  
>> great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to