On Wed, 2011-10-26 at 15:27 +0100, Paolo Castagna wrote: > Dave Reynolds wrote: > > Hi Paolo, > > > > On Wed, 2011-10-26 at 14:34 +0100, Paolo Castagna wrote: > >> Dave Reynolds wrote: > >>> On Wed, 2011-10-26 at 13:38 +0100, Paolo Castagna wrote: > >>> > >>>> I am not sure if these two triples in my data are both "correct", are > >>>> they? > >>>> > >>>> ---- > >>>> <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> . > >>>> <foo:bar2> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#integer> . > >>>> ---- > >>> No. The lexical forms for int and integer do not allow ".". See: > >>> http://www.w3.org/TR/xmlschema-2/#integer etc and > >>> http://www.w3.org/TR/xmlschema11-2/#integer etc > >>> > >>> Perhaps that's why they aren't cannonicalized by TDB. > >> Thank you Dave. > >> > >> I still do not understand why I do not see errors or warnings when > >> I validate my data with http://sparql.org/data-validator.html [1] > > > > Humm. When I go to [1] and type in: > > > > <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> . > > > > Select "Turtle" (default) and press the validate button then I see the > > error: > > > > """ > > [line: 10, col: 20] Lexical form '6.0' not valid for datatype > > http://www.w3.org/2001/XMLSchema#int > > <foo:bar1> <foo:p> "6.0"^^<http://www.w3.org/2001/XMLSchema#int> . > > """ > > > > Not sure what might be different in your case. > > Interesting... > > Sorry, I assumed I would have had the same answer no matter the input format. > But, the validator (in Joseki) is giving different answers for the same data. > In particular, when N-Triples format is used as input, there is no error.
N-Triples was originally (and technically still is) just a format for writing down RDF/XML test cases so it has to be able to represent all syntactically well-formed data even if it isn't legal by other criteria. So it would be incorrect for an N-Triple reader to raise an error on that data. In fact in RDF an ill-formed datatype is not only not a syntax error, it's not even an semantic inconsistency it "just" represents a value which is not in the space of literals. Also, of course, the set of datatypes (other than rdfs:XMLLiteral) is open ended in RDF so there is no guarantee a given processor recognizes xsd:int. Though in practice it's best to do eager checking of such things :) > I have a large N-Triples or N-Quads file, what's the best way (i.e. the more > strict the better for me) to validate the data in it, before ingestion? Use Eyeball? Dave
