Linked Data applications need support for literals datatypes and XSD is
the most import set of datatypes. I put together "xsd4ld" [1] which is
covers the atomic datatypes, except the XML-specific ones.
It uses standard Java XML support for parsing where possible and it's
own parsing, or addition checking after Java parsing, otherwise. Some
Java parsing is lax, covering several datatypes.
xsdl4 provides validation against a datatype and calculation of the
value from the lexical form. It has no dependencies, including no
dependency on Jena.
It aims to collect together all XSD knowledge that a linked data
application might want.
Fortunately, the XSD-part2 spec has regular content structure and is
well marked up, so perl was used to build a lot of the initial codebase.
[1] https://github.com/afs/xsd4ld
---------------
I also looked at Jena's use of Xerces:
Most common is use of org.apache.xerces.util.XMLChar which has the
character classes needed for XML. Solution - take a copy. Some of it is
duplicated elswhere in Java
BaseXMLWriter.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
ParserSupport.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
PrefixMappingImpl.java - com/hp/hpl/jena/shared/impl
Unparser.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
Util.java - com/hp/hpl/jena/rdf/model/impl
schemagen
There is a nearly identical, but older, copy under com.sun.xml.internal
2006-03-16 vs 2008-07-07
It's XML 1.0.
There are some uses of HexBinary and Base64Binary encoding. These are
available in Java8 or Guava.
MakeSkolem.java - com/hp/hpl/jena/reasoner/rulesys/builtins
XSDbase64Binary.java - com/hp/hpl/jena/datatypes/xsd
XSDhexBinary.java - com/hp/hpl/jena/datatypes/xsd
Major rewrite:
XSDDatatype.java - com/hp/hpl/jena/datatypes/xsd
xsd4ld is not a replacement for Jena's datatypes which are more general
and extensible than just XSD. But if we provided fixed XSD then
rewriting XSDDatatype using xsd4ld should mean the datatype support does
not depend on Xerces internals.
Some work is also needed for:
RDFXMLParser.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
In addition:
ARQ uses Xerces regular expressions as an alternative to Java (Java used
to be missing the "x" flag). This can be dropped.
RIOT has some code extracted from jena-core for numeric range checking
which xsd4ld supports. Replace.
Andy