Hi,
While running the latest version of the extraction framework on the
German data dump, I got some property names that have round brackets in
them "(" ")" like for example:
"http://de.dbpedia.org/property/Austragungsort(e)" . The problem is that
Pubby crashes when a page is requested that contains such a property.
The crash is due to Jena, which executes a remote sparql query on
Virtuoso and recieves invalid XML as a response. The problem is that I
don't even know where to fix the bug, The URL RFC [1] Section 2.2 states
that round brackets can be used without escaping them, the URI RFC [2]
section 2.4.3 also doesn't mention them being dissalowed so the
extracted URIs should be valid. However I don't know if the RDF spec
allows property names to contain round brackets .
Is the extracted data invalid, or is there a rdf-spec problem ?
Here is an example of an invalid RDF/XML file with the offending
property URI:
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description
rdf:about="http://de.dbpedia.org/resource/Mercedes-Benz_Championship_(European_Tour)"><n0pred:Austragungsort(e)
xmlns:n0pred="http://de.dbpedia.org/property/"
rdf:resource="http://de.dbpedia.org/resource/Berlin"/></rdf:Description>
</rdf:RDF>
The error looks like this:
com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException:
Element type "n0pred:Austragungsort" must be followed by either
attribute specifications, ">" or "/>".
com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler.fatalError(RDFDefaultErrorHandler.java:45)
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:35)
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:225)
com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:255)
org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:142)
com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:158)
com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:145)
com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:215)
com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:197)
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execModel(QueryEngineHTTP.java:161)
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execDescribe(QueryEngineHTTP.java:154)
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execDescribe(QueryEngineHTTP.java:152)
de.fuberlin.wiwiss.pubby.RemoteSPARQLDataSource.execDescribeQuery(RemoteSPARQLDataSource.java:74)
de.fuberlin.wiwiss.pubby.RemoteSPARQLDataSource.getResourceDescription(RemoteSPARQLDataSource.java:52)
de.fuberlin.wiwiss.pubby.servlets.BaseServlet.getResourceDescription(BaseServlet.java:62)
de.fuberlin.wiwiss.pubby.servlets.PageURLServlet.doGet(PageURLServlet.java:38)
de.fuberlin.wiwiss.pubby.servlets.BaseURLServlet.doGet(BaseURLServlet.java:33)
de.fuberlin.wiwiss.pubby.servlets.BaseServlet.doGet(BaseServlet.java:89)
javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
Kind Regards,
Alexandru Todor
[1] http://www.ietf.org/rfc/rfc1738.txt
[2] http://www.ietf.org/rfc/rfc2396.txt
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion