I'd like to offer it to Jena or the community wants and will engage with. The Jena development cycle isn't a problem here.

https://github.com/afs/morph

The engine works on Graph/Triple/Node - mainly for efficiency reasons (not entirely proven).

I've ended up with a small library to get a little part of the navigation style that the Model API has. What I want to avoid is creating small intermediate Java objects - if that's on the inside of processing loops, it seems to impact performance (CPU cache issues, GC has to do some work even if it is only in the immediate generation etc). I want Java value types (data classes and sealed types ; Project Vahalla). http://cr.openjdk.java.net/~briangoetz/amber/datum.html


A library isn't ideal - when in fine-grained work, operations like "get the object, given subject and predicate" can happen a lot. Some storage does not keep triple objects and creating a triple to return S/P/O just to pull out the O because S/P are fixed does seem a little crazy.

So the simplicity of all access being find(s,p,o) does have a cost which normally isn't important but for graph algorithms (in a general sense) every little cost can add up.

And that also implies streams are not always the way to go. Creating a stream is few java objects and when it is "get a single value" really can kick-in. Think Graph.contains. Does your experience with CommonsRDF give any insight here?

I do think we should add stream(s,p,o) to Graph.

Maybe also some (a few - not go over the top) accessors like "getSP -> Object".

This is stil not Model - that has a lot more like the polymorphism.

    Andy

On 08/07/2019 16:28, Aaron Coburn wrote:
Thank, Andy, I will likely have some data+shape resources in the coming
weeks/months that I would like to test. Are there plans to add this code to
Jena itself, or do you anticipate that it will be part of a separate
repository?

Best,
Aaron

On Mon, 8 Jul 2019 at 10:58, Andy Seaborne <[email protected]> wrote:

I've got a SHACL validation engine working - it covers both the core and
sparql constraints of the W3C Specification.

If anyone has data+shapes, I'll happily use them to run further tests.

Status: passes the WG test suite except for some in
std/sparql/pre-binding/. Optional $shapesGraph and $currentShape are not
supported (more below) and the "unsupported" tests in pre-binding (some
of the rules seem overly restrictive) aren't run.

AKA All valid shapes work, invalid shapes are "what you can get away
with".  This is for future flexibility :-)

None of the non-spec SHACL-AF is covered.

API:

As well as the operations to validate a graph using a given shapes graph
(command line or API), there is also a graph that rejects non-conforming
data in a graph transaction.

Datasets:

SHACL is defined to validate a single graph. To extend to validation of
a dataset, just one set for shapes for all graphs seems a little
restrictive.

Some ideas -- https://afs.github.io/shacl-datasets.html

$shapesGraph is for the case where data and shapes are in one dataset -
I'm not sure that's a very good idea because it imposes conditions on
extending SHACL to data datasets.

Opportunities:

There are possibilities for further work for deeper integration into
dataset update path:

* Parallel execution - some shapes can be applied to an update stream
without reference to the data so can be done on a separate thread
outside the transaction.

* Restricting the validation work needed - for some shapes
(not all, but it is a static analysis of shapes to determine which)
the updates can be tracked to only validate changes. There are ways to
write shapes that (1) apply globally to the data or (2) have indirect
data changes where just looking at the data does not tell you if a shape
might now report violations.

There is some prototyping done but I got sidetracked by shacl-datasets.html

      Andy


Reply via email to