Hi there,

I did a checking pass over the DBPedia 3.0
(obtained from http://www.yr-bcn.es/pmika/datasets/public/dbpedia-v3.nt.tar.gz)

The complete log of checking is at:
http://jena.hpl.hp.com/~afs/DBPedia-check-3.0.log.gz

Line numbers in the file referred to DBPedia 3.0, all the files
concatenated in alphabetic filename order.  To make sure we're all on the same 
page, I've put that file at:
http://jena.hpl.hp.com/~afs/DBPedia-3.0.nt.gz

The log file is 114343 warnings but the warning fall into a few specific 
classes:

== Broken URIs

Some examples:

<http://beeldbank.nationaalarchief.nl%2Findex.php.....
  Corrupt URI

Illegal hostname:
<http://www.gorachaelgo.;;slktrcom>

Very broken: not sure what happened here:
<http://>

Missing port:
<http://www.curamsoftware.com:>

Use of []
<https://zope.cs.ait.ac.th/gmseenet[Greater>
<http://mpuhost02.ait.ac.th/gmsarn-conference-2007/['''Second>
<http://www4.wiwiss.fu-berlin.de/gutendata/resource/people/Leinster_Murray_[pseud]_1896-1975>

Scrambled:
<http://www.http://www.arthurgrosset.com/sabirds/photos/schvir9783.jpg.com/sabirds/photos/schvir9783.jpg>

This form is quite common:
<http://.wikipedia.org/wiki/Image:Latoyatellme1.jpg>

Multiple fragments:
<http://www.fifa.com/tournaments/archive/tournament&#61;104/edition&#61;191057/overview.html>

== Literals

Use of commas in xsd:integer and xsd:decimals:
e.g.
"3,209"^^http://www.w3.org/2001/XMLSchema#integer
"5,856.2"^^http://www.w3.org/2001/XMLSchema#decimal

Use of dot in integers:
"1.00"^^http://www.w3.org/2001/XMLSchema#integer

Strange dates:
"0017-11-70"^^http://www.w3.org/2001/XMLSchema#date

== URI warnings : not in preferred forms:

More minor:
"lowercase is preferred in host names" e.g.
http://arXiv.org/
and
"Uppercase is preferred for % encoding"

-------------
You're probably aware of some or all of these - I've submitted a couple of 
broad tracker items so you can reject or track as required.

        Hope this helps,
        Andy

--------------------------------------------
  Hewlett-Packard Limited
  Registered Office: Cain Road, Bracknell, Berks RG12 1HN
  Registered No: 690597 England

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to