Marvin, Kingsley, On 20 Aug 2008, at 00:16, Georgi Kobilarov wrote: > yes, it's a bug in our dataset.
Actually, no. It's a bug in Virtuoso's SPARQL+XML result format serializer. Ampersands are allowed in URIs, so the Yago URIs are perfectly fine according to all the specs. (We *might* still want to %-encode the ampersand in those URIs, but just for consistency with our other URIs, not because the specs require it. That's a separate question.) The problem is: When a "&" character is included in content inside an XML file, it has to be written as "&". Virtuoso does not do this, hence the breakage. (This is a silly bug. The need to encode reserved characters (& and ") is just about the first thing a developer learns about XML. I hope OpenLink fixes this soon. Kingsley?) Richard > In particular in the Yago dataset, which > has been contributed externally and wasn't created with the DBpedia > framework (but hey, we've got many similar bugs in datasets created by > our framework ;)) > > Yago URIs have not been url-encoded. So as a workaround, you can > url_encode all URIs starting with http://dbpedia.org/class/yago/ in > the > yago_en.nt file before loading it into your Jena model. That should do > it. > > And we'll fix that bug for the future. > > Best, > Georgi > > -- > Georgi Kobilarov > Freie Universität Berlin > www.georgikobilarov.com > >> -----Original Message----- >> From: [EMAIL PROTECTED] > [mailto:dbpedia- >> [EMAIL PROTECTED] On Behalf Of Marvin Lugair >> Sent: Wednesday, August 20, 2008 12:57 AM >> To: [email protected] >> Subject: [Dbpedia-discussion] Ampersand in dbpedia returned URI >> breakingJena code >> >> >> Hi, >> >> The following sparql query: >> select distinct ?Concept where {[] a ?Concept >> >> Is the default query at the dbpedia endpoint http://dbpedia.org/ >> sparql >> It returns several URI's including the following one (notice the and >> sign): >> >> http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople >> >> So DBPedia is returning URI's containing an ampersand. This is >> causing >> an exception in the Jena parser. >> >> How do I fix this? None of Jenas methods will work, I cant transofrm >> the resultset into a model or even print is with the resultformatter. >> If i iterate over it, I can print the results one by one till I get >> to >> the malformed URI. How do I check in my code for malformed URI's? >> >> >> Any ideas? >> Thanks! >> Marv >> ------------- >> >> The code below works till i get a URI with an ampersand. >> The exception is coming from results.nextSolution(). Other Jena >> methods to convert the retrieved resultset to a model directly or >> format it produce the same exception (I assume they have a similar >> iterator inside) >> >> >> QueryExecution qexec = >> QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", >> "select distinct ?Concept where {[] a ?Concept}"); >> >> try { >> ResultSet results = qexec.execSelect(); >> for ( ; results.hasNext() ; ) >> { >> QuerySolution soln = results.nextSolution() ; >> String x = soln.get("Concept").toString(); >> System.out.print(x +"\n"); >> } >> } >> >> finally { >> System.out.println("closing!"); >> qexec.close() ; >> } >> >> >> This will result in the following error: >> >> >> [com.ctc.wstx.exc.WstxLazyException] >> com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character >> '<' >> (code 60); expected a semi-colon after the reference for entity >> 'MelindaGatesFoundationPeople' >> at [row,col {unknown-source}]: [2609,96] >> at >> > com > .ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 >> 5) >> at >> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671) >> at >> > com > .ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav >> a:3505) >> at >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804) >> at >> > com > .ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java >> :674) >> at >> > com.hp.hpl.jena.sparql.resultset.XMLInputStAX > $ResultSetStAX.getOneSolut >> ion(XMLIn\ >> putStAX.java:472) >> at >> > com.hp.hpl.jena.sparql.resultset.XMLInputStAX > $ResultSetStAX.hasNext(XML >> InputStAX\ >> .java:213) >> >> >> >> I also posted this on the Jena group but some seem to suggest it is a >> dbpedia issue: http://tech.groups.yahoo.com/group/jena- >> dev/message/36210 >> >> >> >> >> >> > ----------------------------------------------------------------------- >> -- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win >> great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Dbpedia-discussion mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Dbpedia-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
