05.12.2011 16:58, Danica Damljanovic:
> Hi everybody
> 
> I wanted to create my own local repository for English version of
> dbpedia 3.7 and downloaded all files
> from http://downloads.dbpedia.org/3.7/en/
> 
> (I am using the latest sesame and owlim to do this)
> 
> It seems that mappingbased_properties_en.nt file has many problems, and
> the main one is that non-URI strings are between "<" and ">" making the
> parser throw an exception as things such as 
> <?>
> are not a valid URI.
> 
> Other examples are many URLs that do not start with http:// but still
> are between "<" and ">" such as in
> <http://dbpedia.org/resource/Lee_County,_Florida>
> <http://xmlns.com/foaf/0.1/homepage> <www.lee-county.com/
> <http://www.lee-county.com/>> . (line 164 177)
> 
> I wonder what is the best way to solve this problem. I was optimistic at
> the beginning and started doing 'replace all' to correct some of the
> URIs but then it turns out that the problem can not really be reduced to
> a bunch of patterns easily. for example, there are 'links' to webpages
> which do not even start with 'www' but they are clearly a URL (e.g.
> <ayat-algormezi.blogspot.com <http://ayat-algormezi.blogspot.com>>). One
> common exception is this:
I had similar problem with the Polish DBpedia - the problem is in the
source data, i.e. - some of the links are invalid in Wikipedia. I don't
have any - replace_it_all solution, but you should file a bug for
DBpedia extractor demanding to check if the extracted URIs are valid and
not passing them to the output, if they are not.

Regarding a short-cut solution to your problem - I would pass the
invalid input data via a script in Python or Ruby and leave only these
entries which have valid URIs.

Cheers,
Aleksander

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to