[jira] [Commented] (COMMONSRDF-6) Contract around the internal string of a blank node

ASF GitHub Bot (JIRA) Wed, 29 Apr 2015 03:31:56 -0700

    [ 
https://issues.apache.org/jira/browse/COMMONSRDF-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519039#comment-14519039
 ]


ASF GitHub Bot commented on COMMONSRDF-6:
-----------------------------------------

Github user stain commented on a diff in the pull request:

    https://github.com/apache/incubator-commonsrdf/pull/10#discussion_r29322000
  
    --- Diff: api/src/main/java/org/apache/commons/rdf/api/BlankNode.java ---
    @@ -41,60 +41,51 @@
      * on the concrete syntax or implementation. The syntactic restrictions on 
blank
      * node identifiers, if any, therefore also depend on the concrete RDF 
syntax or
      * implementation.
    - * 
    + *
      * Implementations that handle blank node identifiers in concrete syntaxes 
need
      * to be careful not to create the same blank node from multiple 
occurrences of
      * the same blank node identifier except in situations where this is 
supported
      * by the syntax. </blockquote>
    - * 
    - * A BlankNode object created through the
    - * {@link RDFTermFactory#createBlankNode()} method must be universally 
unique,
    - * and SHOULD contain a {@link UUID} as part of its
    - * {@link #internalIdentifier()}.
    - * 
    - * A BlankNode object created through the
    - * {@link RDFTermFactory#createBlankNode(String)} method must be 
universally
    - * unique, but also produce the same {@link #internalIdentifier()} as any
    - * previous or future calls to that method on that factory with the same
    - * parameters. In addition, it SHOULD contain a {@link UUID} as part of its
    - * {@link #internalIdentifier()}, created using
    - * {@link UUID#nameUUIDFromBytes(byte[])} using a constant salt for each
    - * instance of {@link RDFTermFactory}, with the given identifier joined to 
that
    - * salt in a consistent manner.
    - * 
      *
    + * A BlankNode SHOULD contain a {@link UUID} string as part of its
    + * universally unique {@link #uniqueReference()}.
    + *
    + * @see RDFTermFactory#createBlankNode()
    + * @see RDFTermFactory#createBlankNode(String)
      * @see <a 
href="http://www.w3.org/TR/rdf11-concepts/#dfn-blank-node";>RDF-1.1
      * Blank Node</a>
      */
     public interface BlankNode extends BlankNodeOrIRI {
     
         /**
    -     * Return a <a href=
    -     * "http://www.w3.org/TR/rdf11-concepts/#dfn-blank-node-identifier"; 
>unique
    -     * label</a> for the blank node. This label is generated by either
    -     * {@link RDFTermFactory#createBlankNode()} or
    -     * {@link RDFTermFactory#createBlankNode(String)} and is unique within 
the
    -     * context of the instance of the factory. In particular, successive 
calls
    -     * to the {@link RDFTermFactory#createBlankNode(String)} method on a 
single
    -     * factory with the same parameters MUST return BlankNode objects with
    -     * identical internalIdentifiers, but the identifiers SHOULD be mapped 
to
    -     * unique values in the context of the factory instance.
    -     *
    -     * IMPORTANT: This is not a serialization/syntax label, and there are 
no
    -     * guarantees that it is a valid identifier in any concrete syntax. 
For an
    -     * N-Triples compatible identifier use {@link #ntriplesString()}. For 
all
    -     * other syntaxes, the result of this method must be sanitized to 
produce a
    -     * valid concrete identifier if one is needed.
    +     * Return a reference for uniquely identifying the blank node.
    +     * <p>
    +     * The reference string MUST be universally unique, e.g. blank nodes 
created
    +     * separately in different JVMs or from different {@link 
RDFTermFactory}
    +     * instances MUST NOT have the same reference string.
    --- End diff --
    
    I clarified this to:
    
    > The reference string MUST universally and uniquely identify this blank
    > node. That is, individual blank nodes created separately in different
    > JVMs or from different {@link RDFTermFactory} instances MUST NOT have the
    >  same reference string.
    
    The SHOULD in the RDFTermFactory means you are allowed to have two 
different RDFTermFactory instances that deliberately produces equivalent blank 
nodes for the same note. (e.g. because they are Serializable or a parsing 
session of a single document is done across a distributed system). SHOULD means 
"Don't do it unless you know what you're up to" -- so you can only do it while 
also being compliant with the MUST here that those blank nodes are then 
equivalent.


> Contract around the internal string of a blank node 
> ----------------------------------------------------
>
>                 Key: COMMONSRDF-6
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-6
>             Project: Apache Commons RDF
>          Issue Type: Improvement
>            Reporter: Stian Soiland-Reyes (old)
>             Fix For: 0.1
>
>
> From https://github.com/commons-rdf/commons-rdf/issues/56
> afs:
> {quote}
> RDF 1.1 says "IRIs, literals and blank nodes are distinct and 
> distinguishable." [my emphasis]
> http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph
> This is a consequence of RDF being an abstract syntax - there is no 
> logic/entailment at this level - it was true in RDF 1.0 but now it is 
> explciitly stated in RDF Concepts.
> Distinguishable blank nodes mean that unique characteristics need to align to 
> the Java identity contract.
> At least, the same (= RDFTerm.equals) blank node, even when different java 
> objects, must have the same internal string. (.equals)
> It's a one-way implicition: same internal string does not imply equality so 
> this works across independent implementations.
> An extreme implementation is to always return the same internal string (may 
> not be helpful but should be legal).
> {quote}
> afs:
> {quote}
> This also related to the proposed {{BlankNode.ntriplesString()}}.
> The choice of output string is dependent on the writing process. It only 
> needs to be unique across the file being written. A choice for output is 
> short forms like ":b0", ":b1" etc.
> The ntriples output form is not a unique property of the blank node. I think 
> we should not include ntriplesString in the core common API.
> {quote}
> stain:
> {quote}
> Not sure what this is proposing, but :-1: to remove BlankNode.ntriplesString 
> - and :+1: to improve the contract text for BlankNode.
> I found ntriplesString very useful as it becomes an interoperability point 
> and have (largely) predictable outputs.
> The commons RDF API stays very close to the rdf11-concepts 
> http://www.w3.org/TR/rdf11-concepts/ , which I like. The ntriplesString are 
> however trivial to implement - and almost all implementations are probably 
> going to have something like that anyway. I never liked much that the name 
> doesn't include get - but I guess that is because it is a derived value and 
> might need further calculations.
> The only contentious part is in BlankNode - so perhaps add a specialization 
> of ntriplesString that clarifies the pitfalls here (as we did with equals). 
> The long paragraphs of BlankNode on the top does not currently help to 
> clarify this.
> See the simple implementation of BlankNode for one simple way to deal with 
> those "non-ntriples-valid internal identifiers".
> Always keeping an internal UUID field or similar is another - implementations 
> can decide on what is most natural to their implementations - they probably 
> have already dealt with this already, although possibly not within their 
> equivalent of the BlankNode class. The BlankNode is also free to keep an 
> internal reference to the Graph or "local scope" and use that to generate 
> identifiers.
> There is no requirement anywhere for Blank Node identifiers to always be 
> re-generated in serialization - this is simply a liberty that is available. A 
> serializer based on Commons RDF can still do that - he can simply ignore 
> BlankNode.ntriplesString and create a temporary Map from internalIdentifier 
> to b1, b2, etc. I do however not see why we need to REQUIRE a serializer do 
> such an operation - that is taking this API beyond its scope and into "best 
> practice" (in which case we would also deal with prefixes, preserving prefix 
> names, canonicalizing URIs, etc).
> As an example of the current strength, I was able to write an N-triples 
> serializer in simple by just concatenating the ntriplesString of the 
> components from TripleImpl.toString and then just joining with \n:
> This is powerful - for nothing else it's great for debugging. I am not 
> proposing to add ntriplesString() for Triple, as it might need to be much 
> closer to the Graph - but at least RDFNode.toString() could have a default 
> method that calls ntriplesString() (which is 200 times more useful than 
> LiteralImpl 2bd85b1f529302f9 from Object.toString :) )
> {quote}
> afs:
> {quote}
> Some display string is useful but reading the contract for ntriplesString, it 
> is not (just) for display purposes. c.f. Java toString. There is a different 
> in escaping. I see that TripleImpl.toString does not do syntax escaping.
> Providing a readable RDFNode.toString() would separate the development dsplay 
> concerns (e.g. no escapes maybe) from serialization concerns.
> Some RDF systems implement blank nodes from a sequence (e.g. Mulgara). 
> Actually that policy can be quite convenient for debugging development.
> We could include N-Triples in commons-rdf but to me v1 should targetted as 
> "use RDF data". Parsing and serialization is the concern of the 
> implementation. The simple impl is one such example, not a new RDF system (is 
> it?:-)
> {quote}
> ansell:
> {quote}
> I commented on the pull request to remove some of the tests that test or rely 
> on the BlankNode internal identifier structure, particularly that it be a 
> valid n-triples identifier. However, those tests made it into the merged 
> version because it was otherwise basically okay and we are continually 
> evolving anyway so there is no need to have perfect pull requests at this 
> stage. I will review and merge #55 and then work on any further cases that we 
> may not be testing for yet.
> I am all for defining/clarifying the contract for .toString in the API, even 
> if it says that there is no specific escaping or formatting done on the 
> output of .toString.
> Supporting N-Triples in the base API seems to be natural for two reasons to 
> me. Firstly, it is the simplest syntax, so it doesn't require any particular 
> optimisations and Triples can be streamed out without relying on a particular 
> framework or serialiser. Secondly, for a long time it has been the sole 
> established test case format for RDF, although it is defined on its own for 
> RDF-1.1, so it is a natural single serialisation to support.
> As long as the output of ntriplesString is defined to be implementation and 
> local scope specific for BlankNodes (no confusion with IRI or Literal), I am 
> fine with having it. Given the number of times the BlankNode API references 
> "local scope" right now, we are unlikely to have more users commenting that 
> it is unusual than we already have had for the last 10 years with RDF-1.0.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (COMMONSRDF-6) Contract around the internal string of a blank node

Reply via email to