Jena next (AFS)

Andy Seaborne Sun, 17 Nov 2019 09:35:28 -0800

This is a bit of a brain dump ...

== DatasetGraph


Graph Triple, Quad, DatasetGraph in a single API place.

== Graph - SPI

Graph - add a few navigation operations to make writing system directlyon Graph easier - though still not as rich as the Model API, and avoidmuch of the object churn.


The operations are (not final names)

  Graph.fwd(subject, predicate)
       -- return a single Node or null.
  Graph.fwdList(subject, predicate)
       -- return a list of Nodes
  Graph.fwdUnique(subject, predicate)
       -- return a single Node, exception if 0 or more than one.

Same for "bwk"

https://github.com/apache/jena/blob/master/jena-shacl/src/main/java/org/apache/jena/shacl/lib/G.javais a library version of this that was helpful but adding a fewoperations directly to graph

If the data is known to be good (SHACL), the application code can usefwd()/bwk() without worrying about testing for zero or multiple predicates.

The reason for putting the basic oprations in the Graph interface andnot everything in a library is for potential efficiency. An impl may beable to do a good job of fwd() and if that is the basis of graphanalytics efficiency matters long term, at least not to design it out.


== Assembler

The graph SPI additions is also motivated by assemblers. Assemblers arecurrently Model/Resource based but the important usage is in Fuseki - anideal goal is Fuseki works on Graph/Node.

Converting assemblers to Graph/Node does not look too burdensome andwith a wrapper layer we can hopefully include all the old tests to checkevolution.


== Graph - indexing

Currently, Graphs are term-indexed only or value-indexed, not both.

Graph should plain term-indexed. value-indexing, which can be calculatedon the fly, would be a separate higher-level concept.

This is motivated by scale and having the same behaviour on all graph.At scale, canonicalizing the inputs is better than value-indexing.


"values" would only be in the Model API.

== Transactions

Unify the transaction approach (also changes Model) so complexassemblages of graphs, and other things, are transactional.


Remove graph transactions - replace by
org.apache.jena.sparql.core.Transactional.

Then graphs as views of datasets and also combinations of Transactionalsin single transaction (two DatasetGraph, or collection of Graphs (tehassmebler case)) can be done.


== Events

Make events an intercepting wrapper, not built-in to Graph itself.
Add transaction lifecycle events.

== Streams - yes and no.

A Stream is several java objects so a potential cost

for a simple operations like Graph.contains() or find() or a few thingsis not small.


Keep iterators, provide stream(s,p,o).

== Nodes

Lang tags - force to lower case.

Simplify - remove a layer of indirection. This relates to indexing.

Node_Literal - no LiteralLabels
Node_Blank - two longs or a string label, not using BlankNodeId

Investigate integrate nodes with ARQ's NodeValue.

== IRIs

jena-iri is general, powerful and hard to maintain.
Jena does not use all of it.
Jena needs a simpler, direct parser/checker.

https://github.com/afs/iri4ld

which is a parser in java with little copying. It parse URIs, and thenhas a little on scheme specific rules for http(s), file and URN.

The various open source libraries and JDK classes do not track thecurrent standards very well (RFC 2396 vs RFC 3986). I have found thatcompliance is mixed due to legacy compatibility needs.

Jena next (AFS)

Reply via email to