[
https://issues.apache.org/jira/browse/COMMONSRDF-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244189#comment-15244189
]
ASF subversion and git services commented on COMMONSRDF-8:
----------------------------------------------------------
Commit c8099bcde882349030105568e39a9c1eadd72418 in incubator-commonsrdf's
branch refs/heads/master from [~stain]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-commonsrdf.git;h=c8099bc ]
Merge branch 'master' into COMMONSRDF-8-graph-add-blanknode
> simple .GraphImpl.add() must clone BlankNode
> -----------------------------------------------
>
> Key: COMMONSRDF-8
> URL: https://issues.apache.org/jira/browse/COMMONSRDF-8
> Project: Apache Commons RDF
> Issue Type: Bug
> Reporter: Stian Soiland-Reyes (old)
> Assignee: Stian Soiland-Reyes
> Labels: blanknode, simple
> Fix For: 0.1
>
>
> From https://github.com/commons-rdf/commons-rdf/issues/66
> stain:
> {quote}
> In simple GraphImpl add() it does not clone BlankNode.
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L44
> Lacking #63 or #64, this means that this would wrongly merge two BlankNodes
> from different scopes - say you iterate over triples from graph 1 and add
> them to graph 2. This test-case should be added - and also to verify that
> after copying over, two statements about g1.b1 use the same BlankNode, and
> two about g2.b1 uses the same - but that those two blank nodes are themselves
> different - e.g. must now have different internal identifiers.
> Perhaps a hashing based on Objects.identityHashCode of the original
> localScope (if a BlankNodeImpl which exposes that) or the original BlankNode
> is sufficient - although it could lead to two 'equals()' BlankNodes from g1
> that are not the same object to not be merged. Hence this is just an
> intermediary solution to #48 .
> https://github.com/commons-rdf/commons-rdf/pull/63
> https://github.com/commons-rdf/commons-rdf/pull/64
> https://github.com/commons-rdf/commons-rdf/pull/48
> {quote}
> afs:
> {quote}
> What is "merge"?
> "Merge" is logical(semantic) operation. The API works on abstract syntax.
> Higher level code needs to deal with semantics. Otherwise, "simple" is taking
> a stance of one usage over another.
> Blank nodes can be shared between graphs. Sharing means .equals. If it is
> "same implementation" then using
> If the system implemenation blank node identifier (Concepts, sec 3.4) is
> choosen to be a global unique id, no copy is needed across the same
> implementation. That makes the statement "must now have different internal
> identifiers" too strong.
> Because the API abstracts the key features of an RDFTerm, "simply" can work
> with java objects created in other implementations which would eb good as the
> de facto reference example.
> {quote}
> stain:
> {quote}
> Yes, agree that if you intend for them to be shared, that should be all good
> and fine - but then it should truly be merged as well - that is be both
> .equals(), have same .hashCode() and same (internal)Identifier and
> nTriplesString.
> Lacking globally unique identifiers on BlankNode - GraphImpl.add() will have
> to do something else.
> But now the Simple implementation will let you create two BlankNodes in two
> Graphs g1 and g2 with the same "internal identifier" , e.g. "b1" - and then
> you can add statements using them to a single Graph g3. (for instance because
> you are just adding all triples from g1 and g2 without any modifications).
> In that case you get a strange case where they have the same
> internalIdentifier "b1" and same ntriples string - but they internally keep
> two different "local scopes" back to g1 and g2, and thus their .equals() will
> claim they are not equal.
> This is in contradiction to the javadoc for BlankNode which says:
> https://github.com/commons-rdf/commons-rdf/blob/master/api/src/main/java/com/github/commonsrdf/api/BlankNode.java#L44
> > Implementations that handle blank node identifiers in concrete syntaxes
> > need to be careful not to create the same blank node from multiple
> > occurrences of the same blank node identifier except in situations where
> > this i supported by the syntax.
> So it should be either or - and the test need to verify which one of them it
> is.
> That is what this issue will have to sort out for the upcoming release -
> assuming that something like #63 or #64 won't make it in, as they would avoid
> this issue to a large extent.
> {quote}
> afs:
> {quote}
> Cloning on GraphImpl.add will require long term internal state. What if the
> same blank node is added again later? Not even "merge" helps with this; a
> sequence of adds is all one merge persumably. RDF 1.1 MT
> There are standard, cheap mechanisms for globally unique identifiers. They
> provide the easiest and clearest solution for the generation of fresh blank
> nodes that do not clash and will not in the future by accident. If "simple"
> wishes to impose further contract restrictions, like no shared blank nodes,
> then it can deal with system identifers per graph but no shared blank nodes
> is a significant restriction.
> If it is the same "internal identifier", maybe that because they are supposed
> to be the same. That can happen for example, reconstructing from data on disk
> (not RDF systems, more database-like storage).
> The N-triples string, if the contract is that it can be used in output
> unchanged, must be some long, unique form. Even if it could, the signficance
> is lost on reading: when read in again, the short name can not be used on its
> own (consider reading the same file twice). An internal identifier which is a
> unique poer parser run together with the seen label can be used - the pair is
> unique.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)