Re: Possible Jena 3 Module Structure

Andy Seaborne Mon, 21 Apr 2014 03:57:13 -0700

On 21/04/14 11:00, Claude Warren wrote:

There were a couple of reasons to go with SecNode and SecTriple separate
from Jena's Node and Triple.  First the original design was to be
implementation agnostic so it could be implemented against Jena or any
other RDF data store.

Thanks for the explanation. I can see that the results of e.g.Graph.find are Triple, not SecTriple.

(Triple security with labelled triples looks really quite complicatedwhen two triples, same fact, different security labels, meet).


Second, I needed to create a couple of "new" node types.  Specifically:

FUTURE: which I suspect is similar to your ex2 above, in that I needed to
be able to say that a node in a triple would be created during the
execution of the method.  For example the request to reifiy the triple
<A,B,C> requires the security system to make the following requests
evaluate if the user can create the triples <FUTURE, rdf:type
rdf:Statement>, <FUTURE, rdf:subject, A>, <FUTURE, rdf:predicate, B>  and,
<FUTURE, rdf:object C>.  We don't know what the value of FUTURE will be,
but we do know that it will be a new anonymous node.


Yes - sounds similar and is a high "want" from my side.


and

VARIABLE:  which is in some sense the inverse of ANY.  In the security
system if you evaluate if the user can read <ANY,  rdf:subject, A> you are
testing to ensure that there are no restrictions on the user reading any
triple that has predicate=rdf:subject and object=A.  If you evaluate if the
user can read <VARIABLE, rdf:subject, A> you are asking if there are any
constraints on the user reading triples with predicate=rdf:subject and
object=A.

Does that have the same characteristics as FUTURE, from the POV of Nodedesign? It's a specific new node type with one instance.

Seems like there is a "Node_Symbol" which generalizes that and coversANY, FUTURE, VARIABLE ... and a symbol is equals to one with the samelabel - maybe even intern the string label.


Or c.f. Node_Variable and ARQ's extension Var for named variables.

example:  If the user may only see triples that have

>  < ?x , rdf:subject, A >

where ?x i in the namspace foo then <ANY,  rdf:subject, A> must return
false as there are restrictions, but  <VARIABLE,  rdf:subject, A> must
return true as there are some triples that can be read. (VARIABLE is a bad
name but it comes from the fact that I encountered the requirement while
working with variables in queries).

for ex3 above, would not the cache contain only the specialized node
implementation and the class would have to ensure that it had a proper
instance of the specialized node?  I do this in the compressed graph
implementation where I keep a weak reference to the base node
implementation, and rebuild it if necessary.

As for ex1: yeah messy.

however, suppose we add a method to Node that gets the node type as a URI.
  The node factory would register factories for each type.  The basic node
equality would be something like:

public boolean equals( Object o)
{
if (O instanceof Node)
{
   if this.getType().equals( that.getType() )
   {
     return this.isExactMatch( that )
   }
}
return false;
}

There might be a better way to do this with generics and  something like

Interface Node<T> {
   Class<T> getType();
    boolean isExactMatch( Node<T> );
}

Is that necessary? It's equality if it's the same type so would it notbe possible to implement Node and your own equals:


class Node_Additional {

    // Uses Eclipse .equals generator:
    @Override
    public boolean equals(Object obj) {
        if ( this == obj ) return true ;
        if ( obj == null ) return false ;
        if ( !(obj instanceof Node_Additional) ) return false ;
        Node_Additional other = (Node_Additional)obj ;
        ... specific tests ... e.g.
        return this.isExactMatch(other) ;
    }

If it is to overlap with an existing Node type, then the testing in bothcases is more complicated.

But in any case I think it is possible to create an extensible Node
definition.  I guess that leave the issue of what to do when an older Node
processing system meets a new Node implementation that it doesn't
understand.  Worse case is using a default to crate a URI for the new Node
instance based on its type URI.


        Andy





On Mon, Apr 21, 2014 at 10:16 AM, Andy Seaborne <[email protected]> wrote:

On 20/04/14 19:04, Claude Warren wrote:

The security system does use the current implementations of Triple and
Node
but it does have to convert them to SecTriple and SecNode.

I have several other projects in which I have to convert to/from base Node
and Triple implementations to something else.   Mostly it is to do
serialization or compression.  The RMI implementation and compressed graph
implementations are two that come to mind.

Doing RMI required converting from Node to a serialized Node and then back
again.  I suppose changing Node from a class to an interface would mean
that the node factory implementation would have to do the conversion from
interface to concrete class where required.  But I wonder, how much of the
current code would have to change to work with interfaces.  The various
places that create nodes could continue to do so as they do now since the
nodes and triples they create would be implementations of the Node/Triple
interface.

Perhaps I misunderstand what is meant my "interface only" module.  What I
would like to see is the addition of  Triple and  Node interfaces.

Am I missing something?

Claude


I'm assuming that Node at least and probably Triple/Quad are java
interfaces either way round.  The question is the maven module structure -
whether there is a jena3-core with interfaces and the default in-memory
implementations, or whether there are two modules, one with interfaces only
(no impl, no tests) and one for the default implementations.

 From a user POV, it makes no real difference (they use apache-jena-libs
anyway).  From a system POV, it may be a better or worse choice - I don't
know ATM.  A lot of the complexity of e.g. literals, is in the datatypes,
which is code.

It's hard (not impossible) to add new Node types currently.

ex1: A NodeGraph for graphs-in-graphs (N3 formulae).

ex2: In ARQ it would be nice to have a Node as a marker like "will be
bound at this point in the query execution"  The ex2 Node extension is
"system-local" - it would never leave ARQ, and in fact it's just a
ephemeral item, just useful to reuse containers like Triple and Quad.

The ex1 extension is a new Node type that has widespread consequences. A
first-class array-type is another example.

ex3: Specialised implementation of Node, e.g. to carry around some
storage-specific (e.g. index into a datastructure).

That last one has to be done carefully in Jena because the fundamental
design is that Nodes can be be created and dropped into any graph etc at
any time.  Inference depends on this as does query. In SPARQL, Nodes are
created which don't appear in graphs at all (e.g. results of expressions).

To do an ex3, the subsystem using them still needs to be able to cope with
Nodes it didn't create but it might wish to cache stuff in it's Nodes.  But
having a map (LRU Cache) of "Node -> cached info" would mean normal Nodes
can be used.  It moves the space usage around as extended nodes are going
to be larger than normal Nodes.

It has to be very, very careful about the contract for Node especially the
structure-based .equals.  A URI Node is the same if the URI string is the
same string.  Making .equals work on Nodes carrying additional information
is "interesting" :-)

Resource/Literal are tied to models/graphs and quietly copied if they move
across model boundaries.  They are interfaces.  They have had separate
implementations in the past but as far as I'm aware none exist today.
  Maybe because its a lot of work to rebuild everything the core
implementations, maybe because it's just as easy to build on top of as
within the framework.

That goes more strongly for Node. Node is the realisation of an RDF Term
so it's hard to need it with exactly the right contract semantics.

What is it about SecNode that caused you to not use the current Node?
(Sorry - I'm not familiar enough with the security module architecture - my
bad)

         Andy

PS I think I still agree with most of
http://mail-archives.apache.org/mod_mbox/jena-dev/201207.
mbox/%[email protected]%3E

Re: Possible Jena 3 Module Structure

Reply via email to