It seems that the image extractor may have a bug. I think it's cutting the URLs when it finds characters other than [A-Za-z0-9_/]. Errors are easy to find, just run:
grep /depiction image_en.csv | grep -v 200 | head (this finds some of the errors, but not all). Here are the first few: %40Mail http://xmlns.com/foaf/0.1/depiction http://upload.wikimedia.org/wikipedia/en/thumb/5/5b/ r Adam_Meredith http://xmlns.com/foaf/0.1/depiction http://upload.wikimedia.org/wikipedia/commons/thumb/2/26/Plum r Airline_Highway http://xmlns.com/foaf/0.1/depiction http://upload.wikimedia.org/wikipedia/en/thumb/0/04/61 r Bir-Hakeim_%28Paris_M%C3%A9tro%29 http://xmlns.com/foaf/0.1/depiction http://upload.wikimedia.org/wikipedia/commons/thumb/d/da/Ligne6 r If you access Adam_Meredith, for example, the image URL is: http://upload.wikimedia.org/wikipedia/commons/2/26/Plum%4072.jpg The bug happens on both img and depiction fields. The nt file does not have this problem. Since I'm processing the CSV files instead of the NT ones, please post when you find the bug if it is affecting only the images file or other files as well... ------------------------------------------------------------------------------ Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
