Here is the status of </chair> my work on Jena5.

These are changes done on a branch in my development repo. I'm going to raise issues for each of the these changes and give them all the right GH-nnnn commit message, then propose a Jena repo branch.

There's a note about the RDF/XML parser below.

==== Completed

== Set version to 5.0.0-SNAPSHOT

== Build set to Java17
  Upgrade graalvm dependency (test) GraaVM now requires Java17.

== Rename javax.servlet -> jakarta.servlet
  Update to jetty11

== Node clear-up
- general review and simplification
- Remove BlankNodeId as indexing label from Node_Blank
- LiteralLabel
   Convert LiteralLabel to a class
   Remove use from APIs
     (mostly) - RDFDatatype still reference it but
     I'm not clear why it doesn't use Node_Literal.
   Rework LiteralLabel as term-centric as well as value-centric [1]

== Remove old and partial RDF 1.0 code
   (it was used inconsistently)

== Move ModelMaker into ontology area (it is only used in ont)

== Model API and Model impl
- Remove deprecated
- Remove isXML/isWellFormed from APIs (seems to be meaningless)
- Simplify containers iterators (implementation)
- Remove TripleBoundary, StatementBoundary, GraphExtract, ModelExtract
    Not used by jena-core.
- Remove Selector (already deprecated and unused)
- Remove deprecated: ResourceF
- RDFReaderF and RDFWriterF
  Remove the unnamed language operations which are RDF/XML.
  Deprecate the named language forms in Model.
- Remove reification (interface methods were, mostly, deprecated)

== Add Jena BOM module

== Update to SLF4j 2.x

== Remove unused assemblers.

== Remove JSON-LD 1.0 support

==== TO DO:

Update for Jetty12

Switch to term graphs.

==== Desirable

Replace normal usage of the RDF/XML reader with something more maintainable. [2]

===== Reorgs

Call TDB1 "tdb1"
- Rename artifact jena-tdb as jena-tdb1.
- Move the package tree to org.apache.jenba.tdb1
   Leave legacy API at "org.apache.jena.tdb"
"org.apache.jena.tdb.TDBFactory" -> "org.apache.jena.tdb1.TDB1Factory"

    Andy

[1] LiteralLabel

The idea of LiteralLabel changes is to keep work off the critical part of creating and streaming literals and only creating the value if required. The "value" here is the Model API Java type support and the current GraphMem indexing value.

Ideally, I'd like to pull LiteralLabel into Node_Literal and not have a separate class but that may be a step too far.

[2] The jena-core RDF/XML reader (ARP) in oaj.rdfxml.xmlinput and oaj.rdfxml.xmlinput0 packages are complicated.

PR 1774 changed ARP to use the system IRIx interface, not call jena-iri directly. And the original ARP is also available. 1774 did some cleanup but was quite conservative in that.

https://github.com/apache/jena/pull/1774

ARP has lots of features and it is clear it was developed while RDF/XML was being originally spec'ed. There are features and warnings that aren't in the spec. It does not integrate with the RIOT parser builder very well.

I tried to do a clean-up but I've come to the conclusion it is better/safer to keep ARP as it is after 1774, and write a new RDF/XML parser (RRX - RIOT RDF/XML parser) with the design goal of being just an application/rdf+xml parser.

The existing ARP would remain in jena-core. Testing the new parser is done with "run ARP, runRRX" then test whether the outputs, including occurrence of warnings, are the same. The W3C test suite has mandated warnings. ARP goes further. The order of triple output is also the same (expect reification where the APR output is backwards!)

RRX is actually 2 parsers :-).

One is SAX based, and handles XML entities. The other is StAX based; it first written as a learning exercise. The StAX API does not support XML entities. SAX is a stream of parser events and requires the code to have a coded state machine; StAX uses function call descent to know where in the grammar it is which is easier to understand.

They should produce identical output, down to triple order and messages.

RRX-SAX would be the one that is normally used from RIOT. RRX-StAX is a "stay honest".

ARP is 66 java files. Each RRX parser is one file.

RRX should work with any XML parser because they don't make any assumptions about optional supported XML parsing features. Development has been with the JDK internal one.

Reply via email to