Proposed changes for Jena5

Andy Seaborne Thu, 31 Aug 2023 11:25:34 -0700

Here is the status of </chair> my work on Jena5.

These are changes done on a branch in my development repo. I'm going toraise issues for each of the these changes and give them all the rightGH-nnnn commit message, then propose a Jena repo branch.


There's a note about the RDF/XML parser below.

==== Completed

== Set version to 5.0.0-SNAPSHOT

== Build set to Java17
  Upgrade graalvm dependency (test) GraaVM now requires Java17.

== Rename javax.servlet -> jakarta.servlet
  Update to jetty11

== Node clear-up
- general review and simplification
- Remove BlankNodeId as indexing label from Node_Blank
- LiteralLabel
   Convert LiteralLabel to a class
   Remove use from APIs
     (mostly) - RDFDatatype still reference it but
     I'm not clear why it doesn't use Node_Literal.
   Rework LiteralLabel as term-centric as well as value-centric [1]

== Remove old and partial RDF 1.0 code
   (it was used inconsistently)

== Move ModelMaker into ontology area (it is only used in ont)

== Model API and Model impl
- Remove deprecated
- Remove isXML/isWellFormed from APIs (seems to be meaningless)
- Simplify containers iterators (implementation)
- Remove TripleBoundary, StatementBoundary, GraphExtract, ModelExtract
    Not used by jena-core.
- Remove Selector (already deprecated and unused)
- Remove deprecated: ResourceF
- RDFReaderF and RDFWriterF
  Remove the unnamed language operations which are RDF/XML.
  Deprecate the named language forms in Model.
- Remove reification (interface methods were, mostly, deprecated)

== Add Jena BOM module

== Update to SLF4j 2.x

== Remove unused assemblers.

== Remove JSON-LD 1.0 support

==== TO DO:

Update for Jetty12

Switch to term graphs.

==== Desirable

Replace normal usage of the RDF/XML reader with something moremaintainable. [2]


===== Reorgs

Call TDB1 "tdb1"
- Rename artifact jena-tdb as jena-tdb1.
- Move the package tree to org.apache.jenba.tdb1
   Leave legacy API at "org.apache.jena.tdb"
"org.apache.jena.tdb.TDBFactory" -> "org.apache.jena.tdb1.TDB1Factory"

    Andy

[1] LiteralLabel

The idea of LiteralLabel changes is to keep work off the critical partof creating and streaming literals and only creating the value ifrequired. The "value" here is the Model API Java type support and thecurrent GraphMem indexing value.

Ideally, I'd like to pull LiteralLabel into Node_Literal and not have aseparate class but that may be a step too far.

[2] The jena-core RDF/XML reader (ARP) in oaj.rdfxml.xmlinput andoaj.rdfxml.xmlinput0 packages are complicated.

PR 1774 changed ARP to use the system IRIx interface, not call jena-iridirectly. And the original ARP is also available. 1774 did some cleanupbut was quite conservative in that.


https://github.com/apache/jena/pull/1774

ARP has lots of features and it is clear it was developed while RDF/XMLwas being originally spec'ed. There are features and warnings thataren't in the spec. It does not integrate with the RIOT parser buildervery well.

I tried to do a clean-up but I've come to the conclusion it isbetter/safer to keep ARP as it is after 1774, and write a new RDF/XMLparser (RRX - RIOT RDF/XML parser) with the design goal of being just anapplication/rdf+xml parser.

The existing ARP would remain in jena-core. Testing the new parser isdone with "run ARP, runRRX" then test whether the outputs, includingoccurrence of warnings, are the same. The W3C test suite has mandatedwarnings. ARP goes further. The order of triple output is also the same(expect reification where the APR output is backwards!)


RRX is actually 2 parsers :-).

One is SAX based, and handles XML entities. The other is StAX based; itfirst written as a learning exercise. The StAX API does not support XMLentities. SAX is a stream of parser events and requires the code to havea coded state machine; StAX uses function call descent to know where inthe grammar it is which is easier to understand.


They should produce identical output, down to triple order and messages.

RRX-SAX would be the one that is normally used from RIOT. RRX-StAX is a"stay honest".


ARP is 66 java files. Each RRX parser is one file.

RRX should work with any XML parser because they don't make anyassumptions about optional supported XML parsing features. Developmenthas been with the JDK internal one.

Proposed changes for Jena5

Reply via email to