[ 
https://issues.apache.org/jira/browse/COMMONSRDF-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244189#comment-15244189
 ] 

ASF subversion and git services commented on COMMONSRDF-8:
----------------------------------------------------------

Commit c8099bcde882349030105568e39a9c1eadd72418 in incubator-commonsrdf's 
branch refs/heads/master from [~stain]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-commonsrdf.git;h=c8099bc ]

Merge branch 'master' into COMMONSRDF-8-graph-add-blanknode


>   simple .GraphImpl.add() must clone BlankNode 
> -----------------------------------------------
>
>                 Key: COMMONSRDF-8
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-8
>             Project: Apache Commons RDF
>          Issue Type: Bug
>            Reporter: Stian Soiland-Reyes (old)
>            Assignee: Stian Soiland-Reyes
>              Labels: blanknode, simple
>             Fix For: 0.1
>
>
> From https://github.com/commons-rdf/commons-rdf/issues/66
> stain:
> {quote}
> In simple GraphImpl add() it does not clone BlankNode.
> https://github.com/commons-rdf/commons-rdf/blob/master/simple/src/main/java/com/github/commonsrdf/simple/GraphImpl.java#L44
> Lacking #63 or #64, this means that this would wrongly merge two BlankNodes 
> from different scopes - say you iterate over triples from graph 1 and add 
> them to graph 2. This test-case should be added - and also to verify that 
> after copying over, two statements about g1.b1 use the same BlankNode, and 
> two about g2.b1 uses the same - but that those two blank nodes are themselves 
> different - e.g. must now have different internal identifiers.
> Perhaps a hashing based on Objects.identityHashCode of the original 
> localScope (if a BlankNodeImpl which exposes that) or the original BlankNode 
> is sufficient - although it could lead to two 'equals()' BlankNodes from g1 
> that are not the same object to not be merged. Hence this is just an 
> intermediary solution to #48 .
> https://github.com/commons-rdf/commons-rdf/pull/63
> https://github.com/commons-rdf/commons-rdf/pull/64
> https://github.com/commons-rdf/commons-rdf/pull/48
> {quote}
> afs:
> {quote}
> What is "merge"?
> "Merge" is logical(semantic) operation. The API works on abstract syntax. 
> Higher level code needs to deal with semantics. Otherwise, "simple" is taking 
> a stance of one usage over another.
> Blank nodes can be shared between graphs. Sharing means .equals. If it is 
> "same implementation" then using
> If the system implemenation blank node identifier (Concepts, sec 3.4) is 
> choosen to be a global unique id, no copy is needed across the same 
> implementation. That makes the statement "must now have different internal 
> identifiers" too strong.
> Because the API abstracts the key features of an RDFTerm, "simply" can work 
> with java objects created in other implementations which would eb good as the 
> de facto reference example.
> {quote}
> stain:
> {quote}
> Yes, agree that if you intend for them to be shared, that should be all good 
> and fine - but then it should truly be merged as well - that is be both 
> .equals(), have same .hashCode() and same (internal)Identifier and 
> nTriplesString.
> Lacking globally unique identifiers on BlankNode - GraphImpl.add() will have 
> to do something else.
> But now the Simple implementation will let you create two BlankNodes in two 
> Graphs g1 and g2 with the same "internal identifier" , e.g. "b1" - and then 
> you can add statements using them to a single Graph g3. (for instance because 
> you are just adding all triples from g1 and g2 without any modifications).
> In that case you get a strange case where they have the same 
> internalIdentifier "b1" and same ntriples string - but they internally keep 
> two different "local scopes" back to g1 and g2, and thus their .equals() will 
> claim they are not equal.
> This is in contradiction to the javadoc for BlankNode which says:
> https://github.com/commons-rdf/commons-rdf/blob/master/api/src/main/java/com/github/commonsrdf/api/BlankNode.java#L44
> > Implementations that handle blank node identifiers in concrete syntaxes 
> > need  to be careful not to create the same blank node from multiple 
> > occurrences of  the same blank node identifier except in situations where 
> > this i supported  by the syntax.
> So it should be either or - and the test need to verify which one of them it 
> is.
> That is what this issue will have to sort out for the upcoming release - 
> assuming that something like #63 or #64 won't make it in, as they would avoid 
> this issue to a large extent.
> {quote}
> afs:
> {quote}
> Cloning on GraphImpl.add will require long term internal state. What if the 
> same blank node is added again later? Not even "merge" helps with this; a 
> sequence of adds is all one merge persumably. RDF 1.1 MT
> There are standard, cheap mechanisms for globally unique identifiers. They 
> provide the easiest and clearest solution for the generation of fresh blank 
> nodes that do not clash and will not in the future by accident. If "simple" 
> wishes to impose further contract restrictions, like no shared blank nodes, 
> then it can deal with system identifers per graph but no shared blank nodes 
> is a significant restriction.
> If it is the same "internal identifier", maybe that because they are supposed 
> to be the same. That can happen for example, reconstructing from data on disk 
> (not RDF systems, more database-like storage).
> The N-triples string, if the contract is that it can be used in output 
> unchanged, must be some long, unique form. Even if it could, the signficance 
> is lost on reading: when read in again, the short name can not be used on its 
> own (consider reading the same file twice). An internal identifier which is a 
> unique poer parser run together with the seen label can be used - the pair is 
> unique.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to