Hi Stian, tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]> wrote: > On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote: >> I haven't followed the development in a long time, especially after >> the move to Apache. I just looked at it and I had a few remarks and >> questions. > > Hi, thanks for joining us! Let's hope we haven't scared you away while > we try to get our first Apache release out and have the odd Blank Node > fight.. :) Wait, there was a Blank Node fight and I wasn't part of it? >> Just some background for those who don't already know me: I am part of >> the banana-rdf project [1]. The main approach there relies on defining >> RDF and its operations as a typeclass [2]. In that world, Jena and >> Sesame are just two instances of that typeclass (see for example [4] >> and [5]). So there is no wrapper involved. Still, commons-rdf is >> really a good step in the right direction as we could obsolete a lot >> of stuff. > > I can definitely see that banana-rdf is relevant to commons-rdf - and > also any requirements you might have to commons-rdf coming from Scala > is interesting. That's cool if if we can accommodate commons-rdf to make it really useful from Scala-land. I think it is possible [1]. >> Right now, there is no support in commons-rdf for immutable >> operations. `Graph`s are mutable by default. Is there any plan to make >> an API for immutable graphs? Graphs in banana-rdf are immutable by >> default, and they are persistent in Plantain. We could always wrap an >> immutable graph in a structure with a `var`, but, well, there are >> better ways to do that. > > There has been suggestions along those lines. It is not a requirement > of Graph now to allow .add() etc. - but there is no method to ask if a > graph is mutable or not. > > In the user guide > http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples > we therefore say: > >> Note: Some Graph implementations are immutable, in which case the below may >> throw an UnsupportedOperationException. > > > We could probably add this to the Javadoc of the mutability methods of > Graph with an explicit @throws. > > I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23 > > > > > https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we > should define immutability on the non-Graph objects. > > > In Clerezza's Commons RDF Core (which is somewhat aligned with Commons > RDF) there is an additional marker interface ImmutableGraph -- perhaps > something along those lines would work here? If it doesn't exist in the type (i.e. statically), it's basically lost knowledge. @throws, being silent or explicit, is pretty much useless because now client code needs to check for this possibility. It would be a slightly better to make it an interface. And we could could have another interface for immutable graphs, where `add(Triple)` would return another graph. See how it can be done in [1]. > https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java > > >> `RDFTermFactory` is stateful just to accommodate >> `createBlankNode(String)`. It's stateless otherwise. This is really an >> issue for banana-rdf as everything is defined as pure function (the >> output only depends on the input). > > It does not need to be stateful. > > In simple we implemented this using a final UUID "salt" that is > created per instance of the factory. Do you consider this state? > > https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51 > > > This is then used by > https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41 > as part of a hashing to generate the new uniqueReference(). Thus a > second call on the same factory with the same name will be hashed with > the same salt, and produce the same uniqueReference(), which makes the > second BlankNode equal to the first. > > > But you can achieve the contract by other non-stateful means, for > instance a random UUID that is static final (and hence no state at all > per factory instance), and you can create a uniqueReference() by > concatenating that UUID with the System.identityHashCode() of the > factory and concatenate the provided name. This approach only gives you the illusion that there is no state, but there *is* one (e.g. with UUID, and the atomic counter). Because of its current contract, `createBlankNode(String)` cannot be referentially transparent, and this is an issue if one wants to take a functional approach. > Also you are not required to implement createBlankNode(String) - you > can simply throw UnsupportedOperationException and only support > createBlankNode(). What is the point of doing/allowing that? As a users or implementors, I want to know that I can rely on a method/function that is accessible. And that is also why I dislike the default implementation approach taken in the current draft. > This should probably be noted in the (yet so far just imagined) > Implementors Guide on the Commons RDF website. > > https://issues.apache.org/jira/browse/COMMONSRDF-24 > > >> Is `createBlankNode(String)` really needed? The internal map for >> bnodes could be maintained _outside_ of the factory. Or at least, we >> could pass it as an argument instead: `createBlankNode(Map<String, >> BlankNode>, String)`. > > We did discuss if it was needed - there are arguments against it due > to blank nodes "existing only as themselves" and therefore a single > JVM object per BlankNode should be enough - however having the > flexibility for say a streaming RDF parser to create identitcal blank > node instances without keeping lots of object references felt like a > compelling argument to support this through the factory - with for > example the hashing method above this means no state is required. This is making *very structuring assumption*. Being referentially transparent makes none, and will always accommodate all cases. That being said, I understand the constraints. Note: [1] does not attempt to fix that issue. >> # wrapped values >> >> There are a lot of unnecessary objects because of the class hierarchy. >> In banana-rdf, we can say that RDFTerm is a plain `String` while being >> 100% type-safe. That's already what's happening for the N3.js >> implementation. And in Plantain, Literals are just `java.lang.Object` >> [6] so that we can directly have String, Int, etc. > > Well, this is tricky in Java where you can't force a new interface > onto an existing type. > > How can you have a String as an RDFTerm? Because you use the > ntriplestring? This would require new "instanceOf"-like methods to > check what the string really is - and would require various classes > like LiteralInspector to dissect the string. This sounds to me like > building a different API.. The point is to abstract things away. By choosing to use actual interfaces, you are forcing everything to be under a class hierarchy for no good reason. I do not find the motivation for this approach in the documentation. Please see [1] for a discussion on that subject. > While I can see this can be a valid way to model RDF in a non-OO way, > I think that would be difficult to align with Commons RDF as a > Java-focused API, where most Java programmers would expect type > hierarchies represented as regular Java class hierarchies. I am not convinced. The RDF model is simple enough that another approach is possible [1]. >> That means that there is no way currently to provide a >> `RDFTermFactory` for Plantain. The only alternatives I see right now >> are: > > What is the challenge of returning wrappers? I think this is the > approach that Jena is also considering. > > Presumably if you are providing an RDFTermFactory then that is to > allow JVM code that expects any Commons RDF code to create Plantain > objects for RDF. They would expect to be able to do say: > > factory.createLiteral("Fred").getDatatype() > > which would not work on a returned String You can (of course) do things like that in Scala, in a very typesafe way, i.e. it's not monkey patching. And this is happening in banana-rdf :-) That would be totally compatible with [1]. >> # getTriples vs iterate >> >> Not a big deal but I believe the naming could be improved. When I read >> getTriples, I expect to have all the triples in my hand, but this is >> not quite what Streams are about. On the other hand, when I read >> iterate, I kinda expect the opposite. Of course the types clarify >> everything but I believe it'd be easier to use getTriplesAsStream and >> getTriplesAsIterable. > > I might disagree - but I think this is a valuable to discuss. As I said, not a big deal. Types are what really matters in the end. > I have taken the liberty to report this in your name as: > https://issues.apache.org/jira/browse/COMMONSRDF-22 > > so we can discuss this further in the email thread that should have triggered. > > > > Thanks for all your valuable suggestions - please do keep in touch and > let us know if you have a go at aligning banana-rdf with the upcoming > 0.1.0 release, further feedback on documentation, and anything else! I believe that the the main issue is that the current approach is both for library users _and_ library authors, but there really are two different targets here. I agree that the class for the RDF model can provide a common framework for many Java people. But libraries relying on commons-rdf should not be tied to the classes. Please have a look at this gist to see what I mean and tell me what you think [1]. Alexandre [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb > > -- > Stian Soiland-Reyes > Apache Taverna (incubating), Apache Commons RDF (incubating) > http://orcid.org/0000-0001-9842-9718
