Hi Igor,

extracting image links isn't easy, partly because of copyright concerns. We
try to only extract links to images that are available under a free
license. Additional problems stem from the fact that some images are kept
on a Wikipedia language version like en.wikipedia.org, while others are
kept on commons.wikimedia.org. In the cases you cite, the image extractor
probably erroneously generated a link to an image kept on commons instead
of en.wikipedia.org:

wrong: http://upload.wikimedia.org/wikipedia/commons/c/c0/
Jamelia_-_Thank_You_%28Version_1%29.jpg
correct:
http://upload.wikimedia.org/wikipedia/en/6/67/Jamelia_-_Thank_You_%28Version_1%29.jpg

wrong: http://upload.wikimedia.org/wikipedia/commons/b/bc/
Defense_of_Tuyen_Quan_bf_1923.jpg
correct:
http://upload.wikimedia.org/wikipedia/en/b/bc/Defense_of_Tuyen_Quan_bf_1923.jpg

In the third case, the revision of the Wikipedia page that we extracted the
image link from (http://en.wikipedia.org/wiki/YoloArts?oldid=275092907) simply
contained a link to an image that didn't exist. I think the extractor
should have checked that, but it didn't.

The source code of the extractor can be found here:

http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractor.scala

It's complex, but if you could help us fixing these bugs, that would be
great! I'm afraid we won't have the time before the next DBpedia release.

As for the properties with almost identical names:

http://wiki.dbpedia.org/Downloads37#ontologyinfoboxproperties

... there are three different raw Wikipedia infobox properties for the
birth date of a person. In the the /ontology/ namespace, they are all *mapped
onto one relation* http://dbpedia.org/ontology/birthDate. It is a strong
point of DBpedia to unify these relations.

http://wiki.dbpedia.org/Downloads37#rawinfoboxproperties

*...this data is in the less clean /property/ namespace. The Ontology
Infobox Properties (/ontology/ namespace) should always be preferred over
this data.*
*
*
Regards,
Christopher


On Wed, Apr 11, 2012 at 20:36, Igor Popov <[email protected]> wrote:

> Hi all,
>
> I've been working on a small side project using DBPedia, and found that
> there are too many broken image links in DBPedia. e.g.
>
> http://dbpedia.org/resource/Thank_You_%28Jamelia_album%29
> http://en.wikipedia.org/wiki/Thank_You_%28Jamelia_album%29
> http://upload.wikimedia.org/wikipedia/commons/c/c0/Jamelia_-_Thank_You_%28Version_1%29.jpg
> http://dbpedia.org/resource/Siege_of_Tuyen_Quang
> http://en.wikipedia.org/wiki/Siege_of_Tuyen_Quang
> http://upload.wikimedia.org/wikipedia/commons/b/bc/Defense_of_Tuyen_Quan_bf_1923.jpg
> http://dbpedia.org/resource/Yolo_County_Arts_Council
> http://en.wikipedia.org/wiki/Yolo_County_Arts_Council
> http://upload.wikimedia.org/wikipedia/commons/a/a3/Ycac.png
>
> and many more. And there are some that are incorrect (I assume that this
> is due to errors in the extraction process) e.g.
> http://dbpedia.org/page/Grand_Duchy_of_Tuscany  - foaf:depiction ->
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
>
> Perhaps checking for broken links now and then wouldn't be such a bad
> thing. Just an idea.
>
> Also a lot of properties that seam to be duplicate properties for things:
> http://dbpedia.org/property/birthPlace and
> http://dbpedia.org/property/birthplace. Visor (
> http://visor.psi.enakting.org/) a tool I created is very good at
> surfacing such errors.
>
> Sorry if I'm pointing out things that are already known problems but I'm
> new to this list so I thought I share my experience from a hacker/user
> perspective.
>
> Best,
> --Igor
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to