05.12.2011 16:58, Danica Damljanovic: > Hi everybody > > I wanted to create my own local repository for English version of > dbpedia 3.7 and downloaded all files > from http://downloads.dbpedia.org/3.7/en/ > > (I am using the latest sesame and owlim to do this) > > It seems that mappingbased_properties_en.nt file has many problems, and > the main one is that non-URI strings are between "<" and ">" making the > parser throw an exception as things such as > <?> > are not a valid URI. > > Other examples are many URLs that do not start with http:// but still > are between "<" and ">" such as in > <http://dbpedia.org/resource/Lee_County,_Florida> > <http://xmlns.com/foaf/0.1/homepage> <www.lee-county.com/ > <http://www.lee-county.com/>> . (line 164 177) > > I wonder what is the best way to solve this problem. I was optimistic at > the beginning and started doing 'replace all' to correct some of the > URIs but then it turns out that the problem can not really be reduced to > a bunch of patterns easily. for example, there are 'links' to webpages > which do not even start with 'www' but they are clearly a URL (e.g. > <ayat-algormezi.blogspot.com <http://ayat-algormezi.blogspot.com>>). One > common exception is this: I had similar problem with the Polish DBpedia - the problem is in the source data, i.e. - some of the links are invalid in Wikipedia. I don't have any - replace_it_all solution, but you should file a bug for DBpedia extractor demanding to check if the extracted URIs are valid and not passing them to the output, if they are not.
Regarding a short-cut solution to your problem - I would pass the invalid input data via a script in Python or Ruby and leave only these entries which have valid URIs. Cheers, Aleksander ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
