Georgi, Thanks for the reply! The problem is that loading dbpedia in an RDF store takes close to 40 hours (some RDF stores will even break), therefore I am using the DBPedia virtuoso server for now. Can you think of another solution? Thanks again, Marv
--- On Tue, 8/19/08, Georgi Kobilarov <[EMAIL PROTECTED]> wrote: > From: Georgi Kobilarov <[EMAIL PROTECTED]> > Subject: RE: [Dbpedia-discussion] Ampersand in dbpedia returned URI > breakingJena code > To: [EMAIL PROTECTED], [email protected] > Date: Tuesday, August 19, 2008, 5:16 PM > Marvin, > > yes, it's a bug in our dataset. In particular in the > Yago dataset, which > has been contributed externally and wasn't created with > the DBpedia > framework (but hey, we've got many similar bugs in > datasets created by > our framework ;)) > > Yago URIs have not been url-encoded. So as a workaround, > you can > url_encode all URIs starting with > http://dbpedia.org/class/yago/ in the > yago_en.nt file before loading it into your Jena model. > That should do > it. > > And we'll fix that bug for the future. > > Best, > Georgi > > -- > Georgi Kobilarov > Freie Universität Berlin > www.georgikobilarov.com > > > -----Original Message----- > > From: [EMAIL PROTECTED] > [mailto:dbpedia- > > [EMAIL PROTECTED] On Behalf Of > Marvin Lugair > > Sent: Wednesday, August 20, 2008 12:57 AM > > To: [email protected] > > Subject: [Dbpedia-discussion] Ampersand in dbpedia > returned URI > > breakingJena code > > > > > > Hi, > > > > The following sparql query: > > select distinct ?Concept where {[] a ?Concept > > > > Is the default query at the dbpedia endpoint > http://dbpedia.org/sparql > > It returns several URI's including the following > one (notice the and > > sign): > > > > > http://dbpedia.org/class/yago/Bill&MelindaGatesFoundationPeople > > > > So DBPedia is returning URI's containing an > ampersand. This is causing > > an exception in the Jena parser. > > > > How do I fix this? None of Jenas methods will work, I > cant transofrm > > the resultset into a model or even print is with the > resultformatter. > > If i iterate over it, I can print the results one by > one till I get to > > the malformed URI. How do I check in my code for > malformed URI's? > > > > > > Any ideas? > > Thanks! > > Marv > > ------------- > > > > The code below works till i get a URI with an > ampersand. > > The exception is coming from results.nextSolution(). > Other Jena > > methods to convert the retrieved resultset to a model > directly or > > format it produce the same exception (I assume they > have a similar > > iterator inside) > > > > > > QueryExecution qexec = > > > QueryExecutionFactory.sparqlService("http://DBpedia.org/sparql", > > "select distinct ?Concept where {[] a > ?Concept}"); > > > > try { > > ResultSet results = qexec.execSelect(); > > for ( ; results.hasNext() ; ) > > { > > QuerySolution soln = results.nextSolution() ; > > String x = soln.get("Concept").toString(); > > System.out.print(x +"\n"); > > } > > } > > > > finally { > > System.out.println("closing!"); > > qexec.close() ; > > } > > > > > > This will result in the following error: > > > > > > [com.ctc.wstx.exc.WstxLazyException] > > com.ctc.wstx.exc.WstxUnexpectedCharException: > Unexpected character '<' > > (code 60); expected a semi-colon after the reference > for entity > > 'MelindaGatesFoundationPeople' > > at [row,col {unknown-source}]: [2609,96] > > at > > > com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:4 > > 5) > > at > > > com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:671) > > at > > > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.jav > > a:3505) > > at > > > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:804) > > at > > > com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java > > :674) > > at > > > com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolut > > ion(XMLIn\ > > putStAX.java:472) > > at > > > com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XML > > InputStAX\ > > .java:213) > > > > > > > > I also posted this on the Jena group but some seem to > suggest it is a > > dbpedia issue: > http://tech.groups.yahoo.com/group/jena- > > dev/message/36210 > > > > > > > > > > > > > ----------------------------------------------------------------------- > > -- > > This SF.Net email is sponsored by the Moblin Your Move > Developer's > > challenge > > Build the coolest Linux based applications with Moblin > SDK & win great > > prizes > > Grand prize is a trip for two to an Open Source event > anywhere in the > > world > > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Dbpedia-discussion mailing list > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
