Could this list of incorrect triples feed a data curation process in Wikipedia
itself ?
Envoyé de mon iPad
Le 5 déc. 2011 à 17:49, Danica Damljanovic <[email protected]> a
écrit :
> Thanks Alexander
>
> I think it would really be great if there would be an additional step to the
> 'extraction' framework which would basically remove invalid triples. There
> will always be some errors of this kind in Wikipedia and the only way I see
> it solved in Dbpedia is to check each triple and then publish only the valid
> entries.
>
> Cheers
> Danica
>
> On 5 December 2011 16:16, Aleksander Pohl <[email protected]> wrote:
> 05.12.2011 16:58, Danica Damljanovic:
> > Hi everybody
> >
> > I wanted to create my own local repository for English version of
> > dbpedia 3.7 and downloaded all files
> > from http://downloads.dbpedia.org/3.7/en/
> >
> > (I am using the latest sesame and owlim to do this)
> >
> > It seems that mappingbased_properties_en.nt file has many problems, and
> > the main one is that non-URI strings are between "<" and ">" making the
> > parser throw an exception as things such as
> > <?>
> > are not a valid URI.
> >
> > Other examples are many URLs that do not start with http:// but still
> > are between "<" and ">" such as in
> > <http://dbpedia.org/resource/Lee_County,_Florida>
> > <http://xmlns.com/foaf/0.1/homepage> <www.lee-county.com/
> > <http://www.lee-county.com/>> . (line 164 177)
> >
> > I wonder what is the best way to solve this problem. I was optimistic at
> > the beginning and started doing 'replace all' to correct some of the
> > URIs but then it turns out that the problem can not really be reduced to
> > a bunch of patterns easily. for example, there are 'links' to webpages
> > which do not even start with 'www' but they are clearly a URL (e.g.
> > <ayat-algormezi.blogspot.com <http://ayat-algormezi.blogspot.com>>). One
> > common exception is this:
> I had similar problem with the Polish DBpedia - the problem is in the
> source data, i.e. - some of the links are invalid in Wikipedia. I don't
> have any - replace_it_all solution, but you should file a bug for
> DBpedia extractor demanding to check if the extracted URIs are valid and
> not passing them to the output, if they are not.
>
> Regarding a short-cut solution to your problem - I would pass the
> invalid input data via a script in Python or Ruby and leave only these
> entries which have valid URIs.
>
> Cheers,
> Aleksander
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
>
> --
> Best,
> Danica Damljanovic, PhD
> Research Associate
> GATE team http://gate.ac.uk
> Natural Language Processing Group
> University of Sheffield
> http://www.dcs.shef.ac.uk/~danica
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion