[jira] [Created] (COMMONSRDF-6) Contract around the internal string of a blank node

Stian Soiland-Reyes (JIRA) Sun, 29 Mar 2015 06:53:58 -0700

Stian Soiland-Reyes created COMMONSRDF-6:
--------------------------------------------


             Summary: Contract around the internal string of a blank node 
                 Key: COMMONSRDF-6
                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-6
             Project: Apache Commons RDF
          Issue Type: Improvement
            Reporter: Stian Soiland-Reyes


>From https://github.com/commons-rdf/commons-rdf/issues/56

afs:
{quote}
RDF 1.1 says "IRIs, literals and blank nodes are distinct and distinguishable." 
[my emphasis]

http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

This is a consequence of RDF being an abstract syntax - there is no 
logic/entailment at this level - it was true in RDF 1.0 but now it is 
explciitly stated in RDF Concepts.

Distinguishable blank nodes mean that unique characteristics need to align to 
the Java identity contract.

At least, the same (= RDFTerm.equals) blank node, even when different java 
objects, must have the same internal string. (.equals)

It's a one-way implicition: same internal string does not imply equality so 
this works across independent implementations.

An extreme implementation is to always return the same internal string (may not 
be helpful but should be legal).

{quote}

afs:
{quote}


This also related to the proposed {{BlankNode.ntriplesString()}}.

The choice of output string is dependent on the writing process. It only needs 
to be unique across the file being written. A choice for output is short forms 
like ":b0", ":b1" etc.

The ntriples output form is not a unique property of the blank node. I think we 
should not include ntriplesString in the core common API.
{quote}

stain:
{quote}


Not sure what this is proposing, but :-1: to remove BlankNode.ntriplesString - 
and :+1: to improve the contract text for BlankNode.

I found ntriplesString very useful as it becomes an interoperability point and 
have (largely) predictable outputs.

The commons RDF API stays very close to the rdf11-concepts 
http://www.w3.org/TR/rdf11-concepts/ , which I like. The ntriplesString are 
however trivial to implement - and almost all implementations are probably 
going to have something like that anyway. I never liked much that the name 
doesn't include get - but I guess that is because it is a derived value and 
might need further calculations.

The only contentious part is in BlankNode - so perhaps add a specialization of 
ntriplesString that clarifies the pitfalls here (as we did with equals). The 
long paragraphs of BlankNode on the top does not currently help to clarify this.

See the simple implementation of BlankNode for one simple way to deal with 
those "non-ntriples-valid internal identifiers".

Always keeping an internal UUID field or similar is another - implementations 
can decide on what is most natural to their implementations - they probably 
have already dealt with this already, although possibly not within their 
equivalent of the BlankNode class. The BlankNode is also free to keep an 
internal reference to the Graph or "local scope" and use that to generate 
identifiers.

There is no requirement anywhere for Blank Node identifiers to always be 
re-generated in serialization - this is simply a liberty that is available. A 
serializer based on Commons RDF can still do that - he can simply ignore 
BlankNode.ntriplesString and create a temporary Map from internalIdentifier to 
b1, b2, etc. I do however not see why we need to REQUIRE a serializer do such 
an operation - that is taking this API beyond its scope and into "best 
practice" (in which case we would also deal with prefixes, preserving prefix 
names, canonicalizing URIs, etc).

As an example of the current strength, I was able to write an N-triples 
serializer in simple by just concatenating the ntriplesString of the components 
from TripleImpl.toString and then just joining with \n:

This is powerful - for nothing else it's great for debugging. I am not 
proposing to add ntriplesString() for Triple, as it might need to be much 
closer to the Graph - but at least RDFNode.toString() could have a default 
method that calls ntriplesString() (which is 200 times more useful than 
LiteralImpl 2bd85b1f529302f9 from Object.toString :) )
{quote}

afs:
{quote}


Some display string is useful but reading the contract for ntriplesString, it 
is not (just) for display purposes. c.f. Java toString. There is a different in 
escaping. I see that TripleImpl.toString does not do syntax escaping.

Providing a readable RDFNode.toString() would separate the development dsplay 
concerns (e.g. no escapes maybe) from serialization concerns.

Some RDF systems implement blank nodes from a sequence (e.g. Mulgara). Actually 
that policy can be quite convenient for debugging development.

We could include N-Triples in commons-rdf but to me v1 should targetted as "use 
RDF data". Parsing and serialization is the concern of the implementation. The 
simple impl is one such example, not a new RDF system (is it?:-)
{quote}

ansell:
{quote}


I commented on the pull request to remove some of the tests that test or rely 
on the BlankNode internal identifier structure, particularly that it be a valid 
n-triples identifier. However, those tests made it into the merged version 
because it was otherwise basically okay and we are continually evolving anyway 
so there is no need to have perfect pull requests at this stage. I will review 
and merge #55 and then work on any further cases that we may not be testing for 
yet.

I am all for defining/clarifying the contract for .toString in the API, even if 
it says that there is no specific escaping or formatting done on the output of 
.toString.

Supporting N-Triples in the base API seems to be natural for two reasons to me. 
Firstly, it is the simplest syntax, so it doesn't require any particular 
optimisations and Triples can be streamed out without relying on a particular 
framework or serialiser. Secondly, for a long time it has been the sole 
established test case format for RDF, although it is defined on its own for 
RDF-1.1, so it is a natural single serialisation to support.

As long as the output of ntriplesString is defined to be implementation and 
local scope specific for BlankNodes (no confusion with IRI or Literal), I am 
fine with having it. Given the number of times the BlankNode API references 
"local scope" right now, we are unlikely to have more users commenting that it 
is unusual than we already have had for the last 10 years with RDF-1.0.
{quote}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (COMMONSRDF-6) Contract around the internal string of a blank node

Reply via email to