Having tried to think through how an interface only module would work,
I'm not sure it's a good idea to depend on it for Jena3. It's a major
change and other things would need to wait on it. We can proceed in
steps to avoid a dependency here.
Firstly - the "interfaces only" part is presumably Graph and
DatasetGraph, not Triple, Quad, Node.
The security model still uses Triple and Node as normal doesn't it
Claude? In Jena, Triple, Quad, Node objects are can be created at will
- the inference engine, the parsers and the query subsystem all create
Nodes when needed. To parametrize that, we'd need factories passed
about, and in ARQ, a Node isn't "in" a graph, it can be in many or none
and there are mixed datasets of different graph implementations. That
begins to feel like a lot of work so there had better be a lot of benefit.
The other version of interface only is real Triple, Quad, Node (java
interfaces with usual implementations in same module), and storage
interfaces-only, Graph and DatasetGraph. The size of the memory graph
and DatasetGraph implementations isn't that large.
There is quite a large hierarchy of DatasetGraphs to support different
implementation styles - not sure if they'd be in the storage-interface
only module or not.
Maybe what we need to do is work to a "jena3-core" [*] and then as an
experiment see how much work it is, and articulate the pros and cons of
the approach in more detail.
This takes it off the critical path for other jena3 items for now.
[*] I'll try to write "jena3-" to distinguish the modules from same
name, jena2 modules. No presumption that "jena3-*" is the final module
name.
More inline:
On 10/04/14 23:48, Rob Vesse wrote:
Ok having seen your thoughts I am now leaning towards +1 on having an
interface only module
I had thought that it might be necessary to split up the security
framework into multiple modules so thanks for confirming my suspicions.
I’m assuming all the query engine stuff goes in the SPARQL module (Andy?),
Yes - that's my assumption for the moment. Maybe an engine/API split.
I suppose in principal you could have separate interface/abstract class
and implementation modules for SPARQL but it becomes a question of quite
how many modules do you want to have.
Yes - you can have too many modules!
We can do a coarse grained split and see how it goes. further splitting
as experience and time suggest.
Though I suppose since we are likely to keep the apache-jena-libs modules
and just change which modules that pulls in having a proliferation of
small modules is not necessarily too taxing on users.
The apache-jena-libs POM module should be the normal way to get the
libraries.
More ...
Rob
On 10/04/2014 06:48, "Claude Warren" <[email protected]> wrote:
Comments Inline
On Wed, Apr 9, 2014 at 10:49 PM, Rob Vesse <[email protected]> wrote:
Comments inline:
On 08/04/2014 08:10, "Andy Seaborne" <[email protected]> wrote:
On 08/04/14 14:25, Rob Vesse wrote:
In terms of specific collaboration opportunities I¹ve heard a few
different
ideas. I spent a bunch of time talking with Lewis McGibbney who¹s
involved
in Any23 about how Jena might make it easier for projects to shares
common
functionality like RDF parsers. The current module structures are
something
of a barrier in this regard since we have multiple versions of some
readers
and they are quite closely coupled into some aspects of our APIs.
Improving
modularisation in the future (as I think we hope to do in Jena 3x
eventually) would make things like this easier for people.
Agreed : we need something like
IRI
Non-RDF related common library (Atlas in ARQ currently)
new core (graph API, datasetgraph)
RIOT
API
SPARQL
TDB (split into base, file, b+tree, main)
This looks like the most sensible and concrete modularisation we’ve yet
come up with. To clarify are you thinking that the new core would be
the
Node, Triple, Quad, Graph, Dataset APIs etc and then API would be the
Model and Ontology APIs.
In which case +1000, that makes much more sense.
Particularly then putting RIOT between the new core and the API so that
we
don’t have two sets of readers and writers!
I’d be interested to hear from Claude as to how the jena-security module
would fit in this, I guess it may need to be split into multiple modules
that build on each other and the other modules as appropriate.
<claude>
The security module - (first should probably be renamed permissions but in
any case) just needs interfaces to work most efficiently. Currently it
wraps Graph, and Model and provides a custom query engine that places
access checks in the middle of the SPARQL calls.
Given that. Under the new structure - as I understand it -- security
would
probably have to be broken into 3 sub modules: security-core (graph
stuff),
security-api (model stuff) and security-sparql (query stuff). Or perhaps
I
don't have the model/sparql separation correct in my head. Where does the
query engine that is associated with a model go in the new structure?
</claude>
I don't think we should be too prescriptive as to structure. Firstly,
because given work needed, there maybe better/more important things to
do and also because I hope we get to a release sooner rather than later.
That said, I have done an experimental split of TDB into
-- core system and interfaces.
jena-tdb-base
-- The 2 abstractsions of files as array of blocks
-- and log of variable length byte blobs
jena-tdb-file
-- different implementations of index structures.
jena-tdb-btree, jena-tdb-exthash
-- TDB query engine and client API.
jena-tdb-tdb
...
A "maybe" is a module that is just the interfaces for graph, dataset
etc
etc. and have modules build from that but it looks to me like the
difference between new-core and interface+core+mem is quite small.
Having the in-memory implmentations around is necessary for internal
working.
I’m -0 on this
<claude>Having interfaces in one place makes security easier.
It is also easier to use the contract testing framework to verify
implementations of the interface without cluttering the test package with
concrete test implementations. This is a make it easier for the
integrator/developer issue.
+1 on this.
</claude>
While it’s easy to do and relatively cheap in Java (as compared to .Net
where it is a PITA) and certainly the Sesame folks already take this
approach but I don’t see that there is much value provided. How many
people actually run completely custom Node/Triple/Graph/etc stacks?
And? Java8 so we can sort out iterators.
Let's more actively discuss this.
Sure though I think Java8 is maybe a whole other discussion.
Another related discussion is moving to Git before we get started down
this route because doing this scale of refactoring would be horrible
within SVN.
A separate thread to plan the git repo layout.
Rob
Claude
--
I like: Like Like - The likeliest place on the
web<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren