[Dbpedia-discussion] Bug in image data (CSV file)

Bruno Barberi Gnecco Thu, 16 Apr 2009 15:17:56 -0700

        It seems that the image extractor may have a bug. I think it's cutting 
the URLs when it 
finds characters other than [A-Za-z0-9_/]. Errors are easy to find, just run:


grep /depiction image_en.csv | grep -v 200 | head

(this finds some of the errors, but not all). Here are the first few:

%40Mail http://xmlns.com/foaf/0.1/depiction 
http://upload.wikimedia.org/wikipedia/en/thumb/5/5b/    r
Adam_Meredith   http://xmlns.com/foaf/0.1/depiction 
http://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Plum   r
Airline_Highway http://xmlns.com/foaf/0.1/depiction 
http://upload.wikimedia.org/wikipedia/en/thumb/0/04/61  r
Bir-Hakeim_%28Paris_M%C3%A9tro%29       http://xmlns.com/foaf/0.1/depiction 
http://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Ligne6  r


        If you access Adam_Meredith, for example, the image URL is:

http://upload.wikimedia.org/wikipedia/commons/2/26/Plum%4072.jpg

        The bug happens on both img and depiction fields. The nt file does not 
have this problem. 
Since I'm processing the CSV files instead of the NT ones, please post when you 
find the 
bug if it is affecting only the images file or other files as well...

------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Bug in image data (CSV file)

Reply via email to