This is a bit of a brain dump ...
== DatasetGraph
Graph Triple, Quad, DatasetGraph in a single API place.
== Graph - SPI
Graph - add a few navigation operations to make writing system directly
on Graph easier - though still not as rich as the Model API, and avoid
much of the object churn.
The operations are (not final names)
Graph.fwd(subject, predicate)
-- return a single Node or null.
Graph.fwdList(subject, predicate)
-- return a list of Nodes
Graph.fwdUnique(subject, predicate)
-- return a single Node, exception if 0 or more than one.
Same for "bwk"
https://github.com/apache/jena/blob/master/jena-shacl/src/main/java/org/apache/jena/shacl/lib/G.java
is a library version of this that was helpful but adding a few
operations directly to graph
If the data is known to be good (SHACL), the application code can use
fwd()/bwk() without worrying about testing for zero or multiple predicates.
The reason for putting the basic oprations in the Graph interface and
not everything in a library is for potential efficiency. An impl may be
able to do a good job of fwd() and if that is the basis of graph
analytics efficiency matters long term, at least not to design it out.
== Assembler
The graph SPI additions is also motivated by assemblers. Assemblers are
currently Model/Resource based but the important usage is in Fuseki - an
ideal goal is Fuseki works on Graph/Node.
Converting assemblers to Graph/Node does not look too burdensome and
with a wrapper layer we can hopefully include all the old tests to check
evolution.
== Graph - indexing
Currently, Graphs are term-indexed only or value-indexed, not both.
Graph should plain term-indexed. value-indexing, which can be calculated
on the fly, would be a separate higher-level concept.
This is motivated by scale and having the same behaviour on all graph.
At scale, canonicalizing the inputs is better than value-indexing.
"values" would only be in the Model API.
== Transactions
Unify the transaction approach (also changes Model) so complex
assemblages of graphs, and other things, are transactional.
Remove graph transactions - replace by
org.apache.jena.sparql.core.Transactional.
Then graphs as views of datasets and also combinations of Transactionals
in single transaction (two DatasetGraph, or collection of Graphs (teh
assmebler case)) can be done.
== Events
Make events an intercepting wrapper, not built-in to Graph itself.
Add transaction lifecycle events.
== Streams - yes and no.
A Stream is several java objects so a potential cost
for a simple operations like Graph.contains() or find() or a few things
is not small.
Keep iterators, provide stream(s,p,o).
== Nodes
Lang tags - force to lower case.
Simplify - remove a layer of indirection. This relates to indexing.
Node_Literal - no LiteralLabels
Node_Blank - two longs or a string label, not using BlankNodeId
Investigate integrate nodes with ARQ's NodeValue.
== IRIs
jena-iri is general, powerful and hard to maintain.
Jena does not use all of it.
Jena needs a simpler, direct parser/checker.
https://github.com/afs/iri4ld
which is a parser in java with little copying. It parse URIs, and then
has a little on scheme specific rules for http(s), file and URN.
The various open source libraries and JDK classes do not track the
current standards very well (RFC 2396 vs RFC 3986). I have found that
compliance is mixed due to legacy compatibility needs.