Great!

I can see this can be useful outside of Jena. Is your plan to release
this separately to Maven Central and keep it in GitHub, or are you
proposing to embed it within Jena?

xsd4ld should presumably be low maintenance once any initial bugs are
weeded out, and I see you have extensive tests already.

I'm happy to help with the remaining Xerces surgery you propose. It
looks at first as a big list changes, but except for XSDDatatype it
should just be implementation details which don't have to wait for
Jena 3 - so I think this can be done on either side of 3 (but
definitely something that would be cool to sort before 3.0.0 release)

On 6 April 2015 at 18:34, Andy Seaborne <[email protected]> wrote:
> Linked Data applications need support for literals datatypes and XSD is the
> most import set of datatypes.  I put together "xsd4ld" [1] which is covers
> the atomic datatypes, except the XML-specific ones.
>
> It uses standard Java XML support for parsing where possible and it's own
> parsing, or addition checking after Java parsing, otherwise.  Some Java
> parsing is lax, covering several datatypes.
>
> xsdl4 provides validation against a datatype and calculation of the value
> from the lexical form.  It has no dependencies, including no dependency on
> Jena.
>
> It aims to collect together all XSD knowledge that a linked data application
> might want.
>
> Fortunately, the XSD-part2 spec has regular content structure and is well
> marked up, so perl was used to build a lot of the initial codebase.
>
> [1] https://github.com/afs/xsd4ld
>
> ---------------
>
> I also looked at Jena's use of Xerces:
>
> Most common is use of org.apache.xerces.util.XMLChar which has the character
> classes needed for XML.  Solution - take a copy. Some of it is duplicated
> elswhere in Java
>
> BaseXMLWriter.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
> ParserSupport.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
> PrefixMappingImpl.java - com/hp/hpl/jena/shared/impl
> Unparser.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
> Util.java - com/hp/hpl/jena/rdf/model/impl
> schemagen
>
> There is a nearly identical, but older, copy under com.sun.xml.internal
> 2006-03-16 vs 2008-07-07
> It's XML 1.0.
>
> There are some uses of HexBinary and Base64Binary encoding. These are
> available in Java8 or Guava.
>
> MakeSkolem.java - com/hp/hpl/jena/reasoner/rulesys/builtins
> XSDbase64Binary.java - com/hp/hpl/jena/datatypes/xsd
> XSDhexBinary.java - com/hp/hpl/jena/datatypes/xsd
>
> Major rewrite:
> XSDDatatype.java - com/hp/hpl/jena/datatypes/xsd
>
> xsd4ld is not a replacement for Jena's datatypes which are more general and
> extensible than just XSD.  But if we provided fixed XSD then rewriting
> XSDDatatype using xsd4ld should mean the datatype support does not depend on
> Xerces internals.
>
> Some work is also needed for:
> RDFXMLParser.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
>
> In addition:
>
> ARQ uses Xerces regular expressions as an alternative to Java (Java used to
> be missing the "x" flag).  This can be dropped.
>
> RIOT has some code extracted from jena-core for numeric range checking which
> xsd4ld supports.  Replace.
>
>         Andy
>
>
>
>
>
>



-- 
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Reply via email to