Forwarding a message from Andy Seaborne to dbpedia-discussion Richard
Begin forwarded message: > From: Andy Seaborne <[email protected]> > Date: 15 April 2010 13:43:38 IST > To: [email protected] > Subject: [pedantic-web] DBPedia 3.5 parsing report > Reply-To: [email protected] > > Does this count as pedantic? :-) > > I ran the files from http://downloads.dbpedia.org/3.5/en/ through an > N-Triples parser with checking: > > The report is here (it's 25K lines long which isn't so many given > the size of the data): > > http://www.openjena.org/~afs/DBPedia35-parse-log-2010-04-15.txt > > It covers both strict errors and warnings of ill-advised forms. > > A few examples: > > Bad IRI: <=?(''[[Nepenthes> > Bad IRI: <http://www.european-athletics.org‎> > > Bad lexical forms for the value space: > "1967-02-31"^^http://www.w3.org/2001/XMLSchema#date > (there is no February the 31st) > > Warning of well known ports of other protocols: > http://stream1.securenetsystems.net:443 > > Warning about explicit about port 80: > > http://bibliotecadigitalhispanica.bne.es:80/ > > and use of . and .. in absolute URIs which are all from the standard > list of IRI warnings. > > Bad IRI: <http://dbpedia.org/resource/..> Code: 8/ > NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment /../ > not at the beginning of a relative reference, or it contains a /./ > These should be removed. > > Andy > > Software used: > > The IRI checker, by Jeremy Carroll, is available from > http://www.openjena.org/iri/ and Maven. > > The lexical form checking is done by Apache Xerces. > > The N-triples parser is the one from TDB v0.8.5 which bundles the > above two together. > > > -- > To unsubscribe, reply using "remove me" as the subject. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
