Hi,

While running the latest version of the extraction framework on the 
German data dump, I got some property names that have round brackets in 
them "(" ")" like for example: 
"http://de.dbpedia.org/property/Austragungsort(e)" . The problem is that 
Pubby crashes when a page is requested that contains such a property. 
The crash is due to Jena, which executes a remote sparql query on 
Virtuoso and recieves invalid XML as a response. The problem is that I 
don't even know where to fix the bug, The URL RFC [1] Section 2.2 states 
that round brackets can be used without escaping them, the URI RFC [2] 
section 2.4.3  also doesn't mention them being dissalowed so the 
extracted URIs should be valid. However I don't know if the RDF spec 
allows property names to contain round brackets .

Is the extracted data invalid, or is there a rdf-spec problem ?

Here is an example of an invalid RDF/XML file with the offending 
property URI:

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"; 
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#";>
<rdf:Description 
rdf:about="http://de.dbpedia.org/resource/Mercedes-Benz_Championship_(European_Tour)"><n0pred:Austragungsort(e)
 
xmlns:n0pred="http://de.dbpedia.org/property/"; 
rdf:resource="http://de.dbpedia.org/resource/Berlin"/></rdf:Description>
</rdf:RDF>

The error looks like this:

com.hp.hpl.jena.shared.JenaException: org.xml.sax.SAXParseException: 
Element type "n0pred:Austragungsort" must be followed by either 
attribute specifications, ">" or "/>".
     
com.hp.hpl.jena.rdf.model.impl.RDFDefaultErrorHandler.fatalError(RDFDefaultErrorHandler.java:45)
     
com.hp.hpl.jena.rdf.arp.impl.ARPSaxErrorHandler.fatalError(ARPSaxErrorHandler.java:35)
     com.hp.hpl.jena.rdf.arp.impl.XMLHandler.warning(XMLHandler.java:225)
     com.hp.hpl.jena.rdf.arp.impl.XMLHandler.fatalError(XMLHandler.java:255)
     org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
     org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
     org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
     org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
     
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown 
Source)
     
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
 
Source)
     
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown 
Source)
     org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
     org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
     org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
     org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
     com.hp.hpl.jena.rdf.arp.impl.RDFXMLParser.parse(RDFXMLParser.java:142)
     com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:158)
     com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:145)
     com.hp.hpl.jena.rdf.arp.JenaReader.read(JenaReader.java:215)
     com.hp.hpl.jena.rdf.model.impl.ModelCom.read(ModelCom.java:197)
     
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execModel(QueryEngineHTTP.java:161)
     
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execDescribe(QueryEngineHTTP.java:154)
     
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execDescribe(QueryEngineHTTP.java:152)
     
de.fuberlin.wiwiss.pubby.RemoteSPARQLDataSource.execDescribeQuery(RemoteSPARQLDataSource.java:74)
     
de.fuberlin.wiwiss.pubby.RemoteSPARQLDataSource.getResourceDescription(RemoteSPARQLDataSource.java:52)
     
de.fuberlin.wiwiss.pubby.servlets.BaseServlet.getResourceDescription(BaseServlet.java:62)
     
de.fuberlin.wiwiss.pubby.servlets.PageURLServlet.doGet(PageURLServlet.java:38)
     
de.fuberlin.wiwiss.pubby.servlets.BaseURLServlet.doGet(BaseURLServlet.java:33)
     
de.fuberlin.wiwiss.pubby.servlets.BaseServlet.doGet(BaseServlet.java:89)
     javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
     javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

Kind Regards,
Alexandru Todor


[1] http://www.ietf.org/rfc/rfc1738.txt
[2] http://www.ietf.org/rfc/rfc2396.txt

------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to