On 08/04/15 10:09, Stian Soiland-Reyes wrote:
Great!
I can see this can be useful outside of Jena. Is your plan to release
this separately to Maven Central and keep it in GitHub, or are you
proposing to embed it within Jena?
My default plan is to offer it to the project - it needs a long term
home and a personal github account does not provide that.
xsd4ld should presumably be low maintenance once any initial bugs are
weeded out, and I see you have extensive tests already.
I'm happy to help with the remaining Xerces surgery you propose. It
looks at first as a big list changes, but except for XSDDatatype it
should just be implementation details which don't have to wait for
Jena 3 - so I think this can be done on either side of 3 (but
definitely something that would be cool to sort before 3.0.0 release)
Better to target Jena3 because of Java8, if nothing else, use of
hexBinary/base64binary from the JDK rather than Guava. There are other
java8-isms. But the real reason is subtle changes to contracts in Jena,
ones we don't know people are depending on and probbaly shouldn't be but
are.
Changes to RDF/XML parsing are not trivial - I tried and some tests
broke. ARP uses a custom, detailed setup of Xerces and this needs to be
translated to the standard Java API, or changes to behaviour made.
Changes to XSDDatatype will be visible if app code has been delving
inside the XSD part of the datatypes. I can't think of a reason why it
would need to but the code has a long history and stuff leaks.
xsd4ld now has a copy of XMLChar. It may be needed for adding the
XML-related atomics - if there is any point to that.
Andy
On 6 April 2015 at 18:34, Andy Seaborne <[email protected]> wrote:
Linked Data applications need support for literals datatypes and XSD is the
most import set of datatypes. I put together "xsd4ld" [1] which is covers
the atomic datatypes, except the XML-specific ones.
It uses standard Java XML support for parsing where possible and it's own
parsing, or addition checking after Java parsing, otherwise. Some Java
parsing is lax, covering several datatypes.
xsdl4 provides validation against a datatype and calculation of the value
from the lexical form. It has no dependencies, including no dependency on
Jena.
It aims to collect together all XSD knowledge that a linked data application
might want.
Fortunately, the XSD-part2 spec has regular content structure and is well
marked up, so perl was used to build a lot of the initial codebase.
[1] https://github.com/afs/xsd4ld
---------------
I also looked at Jena's use of Xerces:
Most common is use of org.apache.xerces.util.XMLChar which has the character
classes needed for XML. Solution - take a copy. Some of it is duplicated
elswhere in Java
BaseXMLWriter.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
ParserSupport.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
PrefixMappingImpl.java - com/hp/hpl/jena/shared/impl
Unparser.java - com/hp/hpl/jena/rdfxml/xmloutput/impl
Util.java - com/hp/hpl/jena/rdf/model/impl
schemagen
There is a nearly identical, but older, copy under com.sun.xml.internal
2006-03-16 vs 2008-07-07
It's XML 1.0.
There are some uses of HexBinary and Base64Binary encoding. These are
available in Java8 or Guava.
MakeSkolem.java - com/hp/hpl/jena/reasoner/rulesys/builtins
XSDbase64Binary.java - com/hp/hpl/jena/datatypes/xsd
XSDhexBinary.java - com/hp/hpl/jena/datatypes/xsd
Major rewrite:
XSDDatatype.java - com/hp/hpl/jena/datatypes/xsd
xsd4ld is not a replacement for Jena's datatypes which are more general and
extensible than just XSD. But if we provided fixed XSD then rewriting
XSDDatatype using xsd4ld should mean the datatype support does not depend on
Xerces internals.
Some work is also needed for:
RDFXMLParser.java - com/hp/hpl/jena/rdfxml/xmlinput/impl
In addition:
ARQ uses Xerces regular expressions as an alternative to Java (Java used to
be missing the "x" flag). This can be dropped.
RIOT has some code extracted from jena-core for numeric range checking which
xsd4ld supports. Replace.
Andy