On 08/04/15 10:09, Stian Soiland-Reyes wrote:
Great!

I can see this can be useful outside of Jena. Is your plan to release
this separately to Maven Central and keep it in GitHub, or are you
proposing to embed it within Jena?

My default plan is to offer it to the project - it needs a long term home and a personal github account does not provide that.

xsd4ld should presumably be low maintenance once any initial bugs are
weeded out, and I see you have extensive tests already.

I'm happy to help with the remaining Xerces surgery you propose. It
looks at first as a big list changes, but except for XSDDatatype it
should just be implementation details which don't have to wait for
Jena 3 - so I think this can be done on either side of 3 (but
definitely something that would be cool to sort before 3.0.0 release)

Better to target Jena3 because of Java8, if nothing else, use of hexBinary/base64binary from the JDK rather than Guava. There are other java8-isms. But the real reason is subtle changes to contracts in Jena, ones we don't know people are depending on and probbaly shouldn't be but are.

Changes to RDF/XML parsing are not trivial - I tried and some tests broke. ARP uses a custom, detailed setup of Xerces and this needs to be translated to the standard Java API, or changes to behaviour made.

Changes to XSDDatatype will be visible if app code has been delving inside the XSD part of the datatypes. I can't think of a reason why it would need to but the code has a long history and stuff leaks.

xsd4ld now has a copy of XMLChar. It may be needed for adding the XML-related atomics - if there is any point to that.

        Andy

On 6 April 2015 at 18:34, Andy Seaborne <[email protected]> wrote:
Linked Data applications need support for literals datatypes and XSD is the
most import set of datatypes.  I put together "xsd4ld" [1] which is covers
the atomic datatypes, except the XML-specific ones.

It uses standard Java XML support for parsing where possible and it's own
parsing, or addition checking after Java parsing, otherwise.  Some Java
parsing is lax, covering several datatypes.

xsdl4 provides validation against a datatype and calculation of the value
from the lexical form.  It has no dependencies, including no dependency on
Jena.

It aims to collect together all XSD knowledge that a linked data application
might want.

Fortunately, the XSD-part2 spec has regular content structure and is well
marked up, so perl was used to build a lot of the initial codebase.

[1] https://github.com/afs/xsd4ld

---------------

I also looked at Jena's use of Xerces:

Most common is use of org.apache.xerces.util.XMLChar which has the character
classes needed for XML.  Solution - take a copy. Some of it is duplicated
elswhere in Java

BaseXMLWriter.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
ParserSupport.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
PrefixMappingImpl.java - com/hp/hpl/jena/shared/impl
Unparser.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
Util.java - com/hp/hpl/jena/rdf/model/impl
schemagen

There is a nearly identical, but older, copy under com.sun.xml.internal
2006-03-16 vs 2008-07-07
It's XML 1.0.

There are some uses of HexBinary and Base64Binary encoding. These are
available in Java8 or Guava.

MakeSkolem.java - com/hp/hpl/jena/reasoner/rulesys/builtins
XSDbase64Binary.java - com/hp/hpl/jena/datatypes/xsd
XSDhexBinary.java - com/hp/hpl/jena/datatypes/xsd

Major rewrite:
XSDDatatype.java - com/hp/hpl/jena/datatypes/xsd

xsd4ld is not a replacement for Jena's datatypes which are more general and
extensible than just XSD.  But if we provided fixed XSD then rewriting
XSDDatatype using xsd4ld should mean the datatype support does not depend on
Xerces internals.

Some work is also needed for:
RDFXMLParser.java - com/hp/hpl/jena/rdfxml/xmlinput/impl

In addition:

ARQ uses Xerces regular expressions as an alternative to Java (Java used to
be missing the "x" flag).  This can be dropped.

RIOT has some code extracted from jena-core for numeric range checking which
xsd4ld supports.  Replace.

         Andy










Reply via email to