Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Peter Ansell Tue, 27 Jan 2015 21:33:40 -0800

Hi Stian and Reto,

Blank nodes are hard to support within a single system. They are
fairly close to unsustainable within a general system. However, within
a system that has RDF-1.1 as its theoretical basis, the W3C spec
defines the mapping functions that are necessary to define equivalence
between graphs (but does not say how translation should work in
practice). Hence the discussion and a long contract to come to
agreement about something that is consistent with the W3C specs, but
extends them where necessary to make them work across the JVM.


Part of this issue is that while it is necessary to expose some
internally unique information about the BlankNode, the concrete syntax
(or the Java Object for intra-VM translation), may not have assigned
any identifier to the BlankNode. N-Triples for instance must
necessarily know about an identifier to serialise a Triple independent
of the context of a Graph.

Hence we are trying to converge on a method for consistently assigning
labels to blank nodes based on the parser (sorry if the JVM wide local
scope comment confused you, the local scope probably needs to be
smaller than that, at either the individual document parse level or
the Graph level).

Some of the use cases that we are trying to support are:

1. The same document parsed using the same parser implementation into
the same graph may generate BlankNode objects that are .equals and if
they are .equals the .hashCode must be the same.

2. The same document parsed using the same parser implementation into
two different graphs must generate BlankNode objects that are not
.equals() and hopefully do not have the same .hashCode().

3. Two different documents parsed using the same parser implementation
into the same graph must generate BlankNode objects that are not
.equals() and have different .hashCode() results. This includes cases
where the concrete syntax contained the same label for the blank node.

4. The same document parsed using different parser implementations
into two different graphs must generate BlankNode objects that are not
.equals() and hopefully do not have the same .hashCode().

5. Two different documents parsed using different parser
implementations may be then transferred into the same graph and the
BlankNode objects inside of the graph must not be .equals() if they
came from different physical documents, even if the concrete syntax
contained the same label for the blank node.

Andy has also brought up the possibility of round-tripping in addition
to those requirements. Ie, a BlankNode from one graph could be
inserted into another graph, and after some time it should be possible
to put it back into the first graph and have it operate as if it were
not moved out. The current proposal doesn't allow for that and I am
not sure what would be required for that to work.

In addition, it is hoped that all of the objects in the system could
be immutable within a graph.

We have not discussed trimming graphs previously. I have never come at
RDF with the requirement of being able to remove triples but I may
have had a limited set of use cases. Is there a usecase for that
automatic trimming that could not be easily satisfied using a rules
engine, as any automatic removal of triples is outside of what I
envisioned the scope of Commons RDF to be and it hasn't been brought
up by any others. Even if in RDF theory there is some corner case
where it is allowed for, it is not a general requirement and is not
generally used or asked for in my experience.

I am fairly ambivalent on the case for internalIdentifier being
substitutable for .toString, but currently we need to work out a
consistent way to identify the local scope, and it could be used in
conjunction with either internalIdentifier or toString if both have
the same contract in practice. What we are doing endeavouring to
transfer BlankNodes between implementations inside of the JVM and keep
their general identity (and round-tripping adds another level of
difficulty on top of that). If we just rely on .toString then we may
need to embed the local scope information into the resulting string,
so the two pieces of information would be compressed into one, which
may not be ideal in the end. In a broader sense, it would be great if
the new Commons RDF API didn't enforce restrictions on .toString that
already has consistent meanings in each of the implementations, and
unique new methods give more flexibility there.

Thanks,

Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [RDF] Local Scope and BlankNode internalIdentifier (was: github Commons RDF vs. Apache Commons Sandbox RDF)

Reply via email to