Hi there, I did a checking pass over the DBPedia 3.0 (obtained from http://www.yr-bcn.es/pmika/datasets/public/dbpedia-v3.nt.tar.gz)
The complete log of checking is at: http://jena.hpl.hp.com/~afs/DBPedia-check-3.0.log.gz Line numbers in the file referred to DBPedia 3.0, all the files concatenated in alphabetic filename order. To make sure we're all on the same page, I've put that file at: http://jena.hpl.hp.com/~afs/DBPedia-3.0.nt.gz The log file is 114343 warnings but the warning fall into a few specific classes: == Broken URIs Some examples: <http://beeldbank.nationaalarchief.nl%2Findex.php..... Corrupt URI Illegal hostname: <http://www.gorachaelgo.;;slktrcom> Very broken: not sure what happened here: <http://> Missing port: <http://www.curamsoftware.com:> Use of [] <https://zope.cs.ait.ac.th/gmseenet[Greater> <http://mpuhost02.ait.ac.th/gmsarn-conference-2007/['''Second> <http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Leinster_Murray_[pseud]_1896-1975> Scrambled: <http://www.http://www.arthurgrosset.com/sabirds/photos/schvir9783.jpg.com/sabirds/photos/schvir9783.jpg> This form is quite common: <http://.wikipedia.org/wiki/Image:Latoyatellme1.jpg> Multiple fragments: <http://www.fifa.com/tournaments/archive/tournament=104/edition=191057/overview.html> == Literals Use of commas in xsd:integer and xsd:decimals: e.g. "3,209"^^http://www.w3.org/2001/XMLSchema#integer "5,856.2"^^http://www.w3.org/2001/XMLSchema#decimal Use of dot in integers: "1.00"^^http://www.w3.org/2001/XMLSchema#integer Strange dates: "0017-11-70"^^http://www.w3.org/2001/XMLSchema#date == URI warnings : not in preferred forms: More minor: "lowercase is preferred in host names" e.g. http://arXiv.org/ and "Uppercase is preferred for % encoding" ------------- You're probably aware of some or all of these - I've submitted a couple of broad tracker items so you can reject or track as required. Hope this helps, Andy -------------------------------------------- Hewlett-Packard Limited Registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
