Here is the status of </chair> my work on Jena5.
These are changes done on a branch in my development repo. I'm going to
raise issues for each of the these changes and give them all the right
GH-nnnn commit message, then propose a Jena repo branch.
There's a note about the RDF/XML parser below.
==== Completed
== Set version to 5.0.0-SNAPSHOT
== Build set to Java17
Upgrade graalvm dependency (test) GraaVM now requires Java17.
== Rename javax.servlet -> jakarta.servlet
Update to jetty11
== Node clear-up
- general review and simplification
- Remove BlankNodeId as indexing label from Node_Blank
- LiteralLabel
Convert LiteralLabel to a class
Remove use from APIs
(mostly) - RDFDatatype still reference it but
I'm not clear why it doesn't use Node_Literal.
Rework LiteralLabel as term-centric as well as value-centric [1]
== Remove old and partial RDF 1.0 code
(it was used inconsistently)
== Move ModelMaker into ontology area (it is only used in ont)
== Model API and Model impl
- Remove deprecated
- Remove isXML/isWellFormed from APIs (seems to be meaningless)
- Simplify containers iterators (implementation)
- Remove TripleBoundary, StatementBoundary, GraphExtract, ModelExtract
Not used by jena-core.
- Remove Selector (already deprecated and unused)
- Remove deprecated: ResourceF
- RDFReaderF and RDFWriterF
Remove the unnamed language operations which are RDF/XML.
Deprecate the named language forms in Model.
- Remove reification (interface methods were, mostly, deprecated)
== Add Jena BOM module
== Update to SLF4j 2.x
== Remove unused assemblers.
== Remove JSON-LD 1.0 support
==== TO DO:
Update for Jetty12
Switch to term graphs.
==== Desirable
Replace normal usage of the RDF/XML reader with something more
maintainable. [2]
===== Reorgs
Call TDB1 "tdb1"
- Rename artifact jena-tdb as jena-tdb1.
- Move the package tree to org.apache.jenba.tdb1
Leave legacy API at "org.apache.jena.tdb"
"org.apache.jena.tdb.TDBFactory" -> "org.apache.jena.tdb1.TDB1Factory"
Andy
[1] LiteralLabel
The idea of LiteralLabel changes is to keep work off the critical part
of creating and streaming literals and only creating the value if
required. The "value" here is the Model API Java type support and the
current GraphMem indexing value.
Ideally, I'd like to pull LiteralLabel into Node_Literal and not have a
separate class but that may be a step too far.
[2] The jena-core RDF/XML reader (ARP) in oaj.rdfxml.xmlinput and
oaj.rdfxml.xmlinput0 packages are complicated.
PR 1774 changed ARP to use the system IRIx interface, not call jena-iri
directly. And the original ARP is also available. 1774 did some cleanup
but was quite conservative in that.
https://github.com/apache/jena/pull/1774
ARP has lots of features and it is clear it was developed while RDF/XML
was being originally spec'ed. There are features and warnings that
aren't in the spec. It does not integrate with the RIOT parser builder
very well.
I tried to do a clean-up but I've come to the conclusion it is
better/safer to keep ARP as it is after 1774, and write a new RDF/XML
parser (RRX - RIOT RDF/XML parser) with the design goal of being just an
application/rdf+xml parser.
The existing ARP would remain in jena-core. Testing the new parser is
done with "run ARP, runRRX" then test whether the outputs, including
occurrence of warnings, are the same. The W3C test suite has mandated
warnings. ARP goes further. The order of triple output is also the same
(expect reification where the APR output is backwards!)
RRX is actually 2 parsers :-).
One is SAX based, and handles XML entities. The other is StAX based; it
first written as a learning exercise. The StAX API does not support XML
entities. SAX is a stream of parser events and requires the code to have
a coded state machine; StAX uses function call descent to know where in
the grammar it is which is easier to understand.
They should produce identical output, down to triple order and messages.
RRX-SAX would be the one that is normally used from RIOT. RRX-StAX is a
"stay honest".
ARP is 66 java files. Each RRX parser is one file.
RRX should work with any XML parser because they don't make any
assumptions about optional supported XML parsing features. Development
has been with the JDK internal one.