Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Reto Gmür Mon, 16 Mar 2015 11:04:08 -0700

Wow, that was a fast and substantial reply!

On Mon, Mar 16, 2015 at 3:19 PM, Stian Soiland-Reyes <[email protected]>
wrote:


> impl.sparql looks very interesting! I can imagine getting that
> blanknode logic correct took some work
>
>
> I see you do internal skolemization within the blanknode in order to
> create the context - does this work even when there are multiple
> blanknodes in the result in a circular dependency and no URI-bound
> subject/object?
>

Yes, blank node supports works even in cases like

_:a foaf:knows _:b
_:b foaf:knows _:c
_:c foaf:knows _:a

There will be 3 distinct BlankNodes even if there are no other triples
involving these nodes.

Not sure where you see a skolemization, but it is true that the blank-node
identifiers used in the result set are completely irrelevant (at least once
the filter method returned).

How does it work?

Every BlankNode instance has internally its context (that is the "Minimum
Selfcontained Graph" aka MSG), this context is stored as an ImmutableGraph
where the bnode itself is replaced by a fixed IRI. For two BlankNodes to be
equals these ImmutableGraphs must be equal, immutable graphs are equals if
and only if they are isomorphic. The BlankNode is replaced with a IRI to
distinguish different blank nodes within the same MSG.

For example in the graph

[ foaf:knows [ foaf:knows ex:Alice ] ].

one BlankNode will have the internal graph

<internal-uri> foaf:knows [ foaf:knows ex:Alice ].

and the other

[] foaf:knows <internal-uri>.
<internal-uri> foaf:knows ex:Alice.

The equals method of the BlankNodes will return false because the internal
graphs are not isomorphic.

This method is not enough to make sure the above foaf:knows circle has in
fact 3 (numerically) different BlankNodes even if they are
indistinguishable. For this case BlankNoded objects also have an intzernal
"isoDistinguisher", in the case where an MSG contains multiple
indistinguishable blank nodes different isoDistinguisher values are
assigned to them. Of course BlankNode's equals and hashcode implementation
also take this value into account.


>
> Have you had a go at porting this to the incubator-commons-rdf model
> to see where the gaps are?
>

I'm looking forward to yours ;)

>
> I think it's a good example for exercising the model - as you say
> SPARQL is the "only" method to query RDF data. (There's also Linked
> Data Platform etc which are more specific - but perhaps also could be
> interesting as an exercise).
>
This would be quite hard, especially as not all the information is
expressed by RDF but some headers overrule the semantics of the payload. Of
course one could have one Graph per LDPR but then there would nothing LDP
specific about it, but just about serialized graphs over HTTP. For actually
adding data a difficulty might be to deal with the IRIs assigned by the
server (the client posting "null-relative IRI").



>
> I had a go:
>
>
> https://github.com/stain/clerezza-rdf-core/tree/github-sql/impl.sparql/src/main/java/org/apache/commons/rdf/impl/sparql
>
> Changes:
>
> https://github.com/stain/clerezza-rdf-core/compare/github-sql?expand=1
>

The Stream code is really nice! As you write in a comment the missing
ImmutableGraphs is an issue, I assume the graphs created by the factory
(for example in SparqlBNode) do not evaluate to equals whenever they are
isomorphic, this will break the algorithm described above.

>
>
> (NOTE: Tests not updated! So it probably doesn't work..)
>
>
>
> and it highlighted the problem of incubator-commons-rdf being all
> about Streams and no Collections support - so even something as simple
> as iterating over the triples has to be done Java 8 style.
>
> Some of the impl.sparql code got cleaner of this (e.g.:
>
>                     Stream<BlankNodeOrIRI> subjects =
> context.getTriples().map(t -> t.getSubject());
>                     Stream<Object> objects =
> context.getTriples().map(t -> t.getObject());
>                     Stream<Object> candidates =
> Stream.concat(subjects, objects);
>                     Stream<BlankNode> bnodes = candidates.filter(n ->
> n instanceof BlankNode)
>                             .map(BlankNode.class::cast);
>
> )
>
> ... but other, more traditional iterative code, got trickier. I think
> we should support both styles.  Lots of this would be solved if Graph
> was Iterable<Triple>.
>
>
> I was forced to use the RDFTermFactory as the Simple implementations
> like LiteralImpl are no longer public. Clean enough, but it means
> every class needs one of these to do anything useful (e.g. to create a
> IRI to supply as an argument to Graph.getTriples()):
>
>     private static SimpleRDFTermFactory factory = new
> SimpleRDFTermFactory();
>
>
>
>
> Incubator RDF don't have any support for cloning and making graphs
> immutable. Adding all triples from one graph to another requires
> stream-fun, e.g. instead of collection operations like:
>
>     expandedContext.addAll(startContext);
>
> I had to do the more elaborate, in a way more iterative things like:
>
>     startContext.getTriples().forEach(t -> expandedContext.add(t));
>

I was thinking the following could be difficulties with incubator-commons:

- Lack of immutable graphs with identity based on isomorphism, SparqlGraphs
puts ImmutableGraphs into sets and relies on the Set contract to avoid
duplicates and to check for example if a context has already been "used".
- The BlankNode Identifier become mainly a problem when modifications of
the graphs are allowed, as long as no modification are allowed you could
use a strong hash of the internal graph (or a deterministic serialization)
and the isoDistinguisher. But if changes are allowed than at the moment you
add a triple involving a blanknode its ID would either change, or else (if
the BNode is kept unmodified) adding a second triple with that BlankNode
will result in something like a NoSuchBlankNodeInBackendException.

Cheers,
Reto



>
>
>
>
>
> On 16 March 2015 at 12:02, Reto Gmür <[email protected]> wrote:
> > Hello,
> >
> > With the new repository the clerezza rdf commons previously in the
> commons
> > sandbox are now at:
> >
> > https://git-wip-us.apache.org/repos/asf/clerezza-rdf-core.git
> >
> > I will compare that code with the current status of the code in the
> > incubating rdf-commons project in a later mail.
> >
> > Now I would like to point to your attention a big step forward towards
> > CLEREZZA-856. The impl.sparql modules provide an implementation of the
> API
> > on top of a SPARQL endpoint. Currently it only supports read access. For
> > usage example see the tests in
> > /src/test/java/org/apache/commons/rdf/impl/sparql (
> >
> https://git-wip-us.apache.org/repos/asf?p=clerezza-rdf-core.git;a=tree;f=impl.sparql/src/test/java/org/apache/commons/rdf/impl/sparql;h=cb9c98bcf427452392e74cd162c08ab308359c13;hb=HEAD
> > )
> >
> > The hard part was supporting BlankNodes. The current implementation
> handles
> > them correctly even in tricky situations, however the current code is not
> > optimized for performance yet. As soon as BlankNodes are involved many
> > queries have to be sent to the backend. I'm sure some SPARQL wizard could
> > help making things more efficient.
> >
> > Since SPARQL is the only standardized methods to query RDF data, I think
> > being able to façade an RDF Graph accessible via SPARQL is an important
> > usecase for an RDF API, so it would be good to also have an SPARQL backed
> > implementation of the API proposal in the incubating commons-rdf
> repository.
> >
> > Cheers,
> > Reto
>
>
>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons RDF (incubating)
> http://orcid.org/0000-0001-9842-9718
>

Re: Clerezze RDF commons moved back to clerezza with a SPARQL Backend

Reply via email to