On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote: > I haven't followed the development in a long time, especially after > the move to Apache. I just looked at it and I had a few remarks and > questions.
Hi, thanks for joining us! Let's hope we haven't scared you away while we try to get our first Apache release out and have the odd Blank Node fight.. :) > Just some background for those who don't already know me: I am part of > the banana-rdf project [1]. The main approach there relies on defining > RDF and its operations as a typeclass [2]. In that world, Jena and > Sesame are just two instances of that typeclass (see for example [4] > and [5]). So there is no wrapper involved. Still, commons-rdf is > really a good step in the right direction as we could obsolete a lot > of stuff. I can definitely see that banana-rdf is relevant to commons-rdf - and also any requirements you might have to commons-rdf coming from Scala is interesting. > Right now, there is no support in commons-rdf for immutable > operations. `Graph`s are mutable by default. Is there any plan to make > an API for immutable graphs? Graphs in banana-rdf are immutable by > default, and they are persistent in Plantain. We could always wrap an > immutable graph in a structure with a `var`, but, well, there are > better ways to do that. There has been suggestions along those lines. It is not a requirement of Graph now to allow .add() etc. - but there is no method to ask if a graph is mutable or not. In the user guide http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples we therefore say: > Note: Some Graph implementations are immutable, in which case the below may > throw an UnsupportedOperationException. We could probably add this to the Javadoc of the mutability methods of Graph with an explicit @throws. I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23 https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we should define immutability on the non-Graph objects. In Clerezza's Commons RDF Core (which is somewhat aligned with Commons RDF) there is an additional marker interface ImmutableGraph -- perhaps something along those lines would work here? https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java > `RDFTermFactory` is stateful just to accommodate > `createBlankNode(String)`. It's stateless otherwise. This is really an > issue for banana-rdf as everything is defined as pure function (the > output only depends on the input). It does not need to be stateful. In simple we implemented this using a final UUID "salt" that is created per instance of the factory. Do you consider this state? https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51 This is then used by https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41 as part of a hashing to generate the new uniqueReference(). Thus a second call on the same factory with the same name will be hashed with the same salt, and produce the same uniqueReference(), which makes the second BlankNode equal to the first. But you can achieve the contract by other non-stateful means, for instance a random UUID that is static final (and hence no state at all per factory instance), and you can create a uniqueReference() by concatenating that UUID with the System.identityHashCode() of the factory and concatenate the provided name. Also you are not required to implement createBlankNode(String) - you can simply throw UnsupportedOperationException and only support createBlankNode(). This should probably be noted in the (yet so far just imagined) Implementors Guide on the Commons RDF website. https://issues.apache.org/jira/browse/COMMONSRDF-24 > Is `createBlankNode(String)` really needed? The internal map for > bnodes could be maintained _outside_ of the factory. Or at least, we > could pass it as an argument instead: `createBlankNode(Map<String, > BlankNode>, String)`. We did discuss if it was needed - there are arguments against it due to blank nodes "existing only as themselves" and therefore a single JVM object per BlankNode should be enough - however having the flexibility for say a streaming RDF parser to create identitcal blank node instances without keeping lots of object references felt like a compelling argument to support this through the factory - with for example the hashing method above this means no state is required. > # wrapped values > > There are a lot of unnecessary objects because of the class hierarchy. > In banana-rdf, we can say that RDFTerm is a plain `String` while being > 100% type-safe. That's already what's happening for the N3.js > implementation. And in Plantain, Literals are just `java.lang.Object` > [6] so that we can directly have String, Int, etc. Well, this is tricky in Java where you can't force a new interface onto an existing type. How can you have a String as an RDFTerm? Because you use the ntriplestring? This would require new "instanceOf"-like methods to check what the string really is - and would require various classes like LiteralInspector to dissect the string. This sounds to me like building a different API.. While I can see this can be a valid way to model RDF in a non-OO way, I think that would be difficult to align with Commons RDF as a Java-focused API, where most Java programmers would expect type hierarchies represented as regular Java class hierarchies. > That means that there is no way currently to provide a > `RDFTermFactory` for Plantain. The only alternatives I see right now > are: What is the challenge of returning wrappers? I think this is the approach that Jena is also considering. Presumably if you are providing an RDFTermFactory then that is to allow JVM code that expects any Commons RDF code to create Plantain objects for RDF. They would expect to be able to do say: factory.createLiteral("Fred").getDatatype() which would not work on a returned String. > # getTriples vs iterate > > Not a big deal but I believe the naming could be improved. When I read > getTriples, I expect to have all the triples in my hand, but this is > not quite what Streams are about. On the other hand, when I read > iterate, I kinda expect the opposite. Of course the types clarify > everything but I believe it'd be easier to use getTriplesAsStream and > getTriplesAsIterable. I might disagree - but I think this is a valuable to discuss. I have taken the liberty to report this in your name as: https://issues.apache.org/jira/browse/COMMONSRDF-22 so we can discuss this further in the email thread that should have triggered. Thanks for all your valuable suggestions - please do keep in touch and let us know if you have a go at aligning banana-rdf with the upcoming 0.1.0 release, further feedback on documentation, and anything else! -- Stian Soiland-Reyes Apache Taverna (incubating), Apache Commons RDF (incubating) http://orcid.org/0000-0001-9842-9718
