Hi everybody

I wanted to create my own local repository for English version of dbpedia
3.7 and downloaded all files from http://downloads.dbpedia.org/3.7/en/

(I am using the latest sesame and owlim to do this)

It seems that mappingbased_properties_en.nt file has many problems, and the
main one is that non-URI strings are between "<" and ">" making the parser
throw an exception as things such as
<?>
are not a valid URI.

Other examples are many URLs that do not start with http:// but still are
between "<" and ">" such as in
<http://dbpedia.org/resource/Lee_County,_Florida> <
http://xmlns.com/foaf/0.1/homepage> <www.lee-county.com/> . (line 164 177)

I wonder what is the best way to solve this problem. I was optimistic at
the beginning and started doing 'replace all' to correct some of the URIs
but then it turns out that the problem can not really be reduced to a bunch
of patterns easily. for example, there are 'links' to webpages which do not
even start with 'www' but they are clearly a URL (e.g. <
ayat-algormezi.blogspot.com>). One common exception is this:

Not a valid (absolute) URI: ayat-algormezi.blogspot.com [line 83864]
org.openrdf.rio.RDFParseException: Not a valid (absolute) URI:
ayat-algormezi.blogspot.com [line 83864]
        at
org.openrdf.rio.helpers.RDFParserBase.reportFatalError(RDFParserBase.java:566)
        at
org.openrdf.rio.ntriples.NTriplesParser.reportFatalError(NTriplesParser.java:547)
        at
org.openrdf.rio.helpers.RDFParserBase.createURI(RDFParserBase.java:295)
        at
org.openrdf.rio.ntriples.NTriplesParser.createURI(NTriplesParser.java:478)
        at
org.openrdf.rio.ntriples.NTriplesParser.parseObject(NTriplesParser.java:326)
        at
org.openrdf.rio.ntriples.NTriplesParser.parseTriple(NTriplesParser.java:246)
        at
org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:170)
        at
org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:112)
        at
org.openrdf.repository.base.RepositoryConnectionBase.addInputStreamOrReader(RepositoryConnectionBase.java:406)
        at
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:297)
        at
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:228)
        at
org.researchsem.RepositoryLoader.loadFiles(RepositoryLoader.java:190)
        at
org.researchsem.RepositoryLoader.loadDir(RepositoryLoader.java:126)
        at org.researchsem.RepositoryUtils.init(RepositoryUtils.java:199)
        at org.researchsem.InitRepository.main(InitRepository.java:38)


Any thought or help much appreciated.
-- 
Best,
Danica Damljanovic, PhD
Research Associate
GATE team http://gate.ac.uk
Natural Language Processing Group
University of Sheffield
http://www.dcs.shef.ac.uk/~danica
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to