Richard Cyganiak wrote:
> Marvin, Kingsley,
>
> On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote:
>> yes, it's a bug in our dataset.
>
> Actually, no. It's a bug in Virtuoso's SPARQL+XML result format 
> serializer.
>
> Ampersands are allowed in URIs, so the Yago URIs are perfectly fine 
> according to all the specs. (We *might* still want to %-encode the 
> ampersand in those URIs, but just for consistency with our other URIs, 
> not because the specs require it. That's a separate question.)
>
> The problem is: When a "&" character is included in content inside an 
> XML file, it has to be written as "&". Virtuoso does not do this, 
> hence the breakage.
>
> (This is a silly bug. The need to encode reserved characters (& and ") 
> is just about the first thing a developer learns about XML. I hope 
> OpenLink fixes this soon. Kingsley?)
>
> Richard
>
>
>
>> In particular in the Yago dataset, which
>> has been contributed externally and wasn't created with the DBpedia
>> framework (but hey, we've got many similar bugs in datasets created by
>> our framework ;))
>>
>> Yago URIs have not been url-encoded. So as a workaround, you can
>> url_encode all URIs starting with http://dbpedia.org/class/yago/ in the
>> yago_en.nt file before loading it into your Jena model. That should do
>> it.
>>
>> And we'll fix that bug for the future.
>>
>> Best,
>> Georgi
>>
>> -- 
>> Georgi Kobilarov
>> Freie Universität Berlin
>> www.georgikobilarov.com
>>
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED]
>> [mailto:dbpedia-
>>> [EMAIL PROTECTED] On Behalf Of Marvin Lugair
>>> Sent: Wednesday, August 20, 2008 12:57 AM
>>> To: [email protected]
>>> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI
>>> breakingJena code
>>>
>>>
>>> Hi,
>>>
>>> The following sparql query:
>>> select distinct ?Concept where {[] a ?Concept
>>>
>>> Is the default query at the dbpedia endpoint http://dbpedia.org/sparql
>>> It returns several URI's including the following one (notice the and
>>> sign):
>>>
>>> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople
>>>
>>> So DBPedia is returning URI's containing an ampersand. This is causing
>>> an exception in the Jena parser.
>>>
>>> How do I fix this? None of Jenas methods will work, I cant transofrm
>>> the resultset into a model or even print is with the resultformatter.
>>> If i iterate over it, I can print the results one by one till I get to
>>> the malformed URI. How do I check in my code for malformed URI's?
>>>
>>>
>>> Any ideas?
>>> Thanks!
>>> Marv
>>> -------------
>>>
>>> The code below works till i get a URI with an ampersand.
>>> The exception is coming from results.nextSolution(). Other Jena
>>> methods to convert the retrieved resultset to a model directly or
>>> format it produce the same exception (I assume they have a similar
>>> iterator inside)
>>>
>>>
>>> QueryExecution qexec =
>>> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql";,
>>> "select distinct ?Concept where {[] a ?Concept}");
>>>
>>> try {
>>> ResultSet results = qexec.execSelect();
>>> for ( ; results.hasNext() ; )
>>> {
>>> QuerySolution soln = results.nextSolution() ;
>>> String x = soln.get("Concept").toString();
>>> System.out.print(x +"\n");
>>> }
>>> }
>>>
>>> finally {
>>> System.out.println("closing!");
>>> qexec.close() ;
>>> }
>>>
>>>
>>> This will result in the following error:
>>>
>>>
>>> [com.ctc.wstx.exc.WstxLazyException]
>>> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '<'
>>> (code 60); expected a semi-colon after the reference for entity
>>> 'MelindaGatesFoundationPeople'
>>> at [row,col {unknown-source}]: [2609,96]
>>> at
>>>
>> com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4
>>> 5)
>>> at
>>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671)
>>> at
>>>
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav
>>> a:3505)
>>> at
>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804)
>>> at
>>>
>> com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java
>>> :674)
>>> at
>>>
>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut
>>> ion(XMLIn\
>>> putStAX.java:472)
>>> at
>>>
>> com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML
>>> InputStAX\
>>> .java:213)
>>>
>>>
>>>
>>> I also posted this on the Jena group but some seem to suggest it is a
>>> dbpedia issue: http://tech.groups.yahoo.com/group/jena-
>>> dev/message/36210
>>>
>>>
>>>
>>>
>>>
>>>
>> -----------------------------------------------------------------------
>>> -- 
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Dbpedia-discussion mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>> ------------------------------------------------------------------------- 
>>
>> This SF.Net email is sponsored by the Moblin Your Move Developer's 
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win 
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the 
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
All,

Fixed.

Please verify.


-- 


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com





-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to