Linked Data applications need support for literals datatypes and XSD is the most import set of datatypes. I put together "xsd4ld" [1] which is covers the atomic datatypes, except the XML-specific ones.

It uses standard Java XML support for parsing where possible and it's own parsing, or addition checking after Java parsing, otherwise. Some Java parsing is lax, covering several datatypes.

xsdl4 provides validation against a datatype and calculation of the value from the lexical form. It has no dependencies, including no dependency on Jena.

It aims to collect together all XSD knowledge that a linked data application might want.

Fortunately, the XSD-part2 spec has regular content structure and is well marked up, so perl was used to build a lot of the initial codebase.

[1] https://github.com/afs/xsd4ld

---------------

I also looked at Jena's use of Xerces:

Most common is use of org.apache.xerces.util.XMLChar which has the character classes needed for XML. Solution - take a copy. Some of it is duplicated elswhere in Java

BaseXMLWriter.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
ParserSupport.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
PrefixMappingImpl.java - com/hp/hpl/jena/shared/impl
Unparser.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
Util.java - com/hp/hpl/jena/rdf/model/impl
schemagen

There is a nearly identical, but older, copy under com.sun.xml.internal
2006-03-16 vs 2008-07-07
It's XML 1.0.

There are some uses of HexBinary and Base64Binary encoding. These are available in Java8 or Guava.

MakeSkolem.java - com/hp/hpl/jena/reasoner/rulesys/builtins
XSDbase64Binary.java - com/hp/hpl/jena/datatypes/xsd
XSDhexBinary.java - com/hp/hpl/jena/datatypes/xsd

Major rewrite:
XSDDatatype.java - com/hp/hpl/jena/datatypes/xsd

xsd4ld is not a replacement for Jena's datatypes which are more general and extensible than just XSD. But if we provided fixed XSD then rewriting XSDDatatype using xsd4ld should mean the datatype support does not depend on Xerces internals.

Some work is also needed for:
RDFXMLParser.java - com/hp/hpl/jena/rdfxml/xmlinput/impl

In addition:

ARQ uses Xerces regular expressions as an alternative to Java (Java used to be missing the "x" flag). This can be dropped.

RIOT has some code extracted from jena-core for numeric range checking which xsd4ld supports. Replace.

        Andy






Reply via email to