On Thu, May 7, 2015 at 11:18 AM, Alexandre Bertails <[email protected]> wrote: > Hi Stian, > > tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
I updated the gist with three examples at the end: CommonsRDF, StringRDF, PlantainRDF. Note how the types do not have to relate to the current interfaces, but they can if you want/need to. Alexandre > > On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]> wrote: >> On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote: >>> I haven't followed the development in a long time, especially after >>> the move to Apache. I just looked at it and I had a few remarks and >>> questions. >> >> Hi, thanks for joining us! Let's hope we haven't scared you away while >> we try to get our first Apache release out and have the odd Blank Node >> fight.. :) > > Wait, there was a Blank Node fight and I wasn't part of it? > >>> Just some background for those who don't already know me: I am part of >>> the banana-rdf project [1]. The main approach there relies on defining >>> RDF and its operations as a typeclass [2]. In that world, Jena and >>> Sesame are just two instances of that typeclass (see for example [4] >>> and [5]). So there is no wrapper involved. Still, commons-rdf is >>> really a good step in the right direction as we could obsolete a lot >>> of stuff. >> >> I can definitely see that banana-rdf is relevant to commons-rdf - and >> also any requirements you might have to commons-rdf coming from Scala >> is interesting. > > That's cool if if we can accommodate commons-rdf to make it really > useful from Scala-land. I think it is possible [1]. > >>> Right now, there is no support in commons-rdf for immutable >>> operations. `Graph`s are mutable by default. Is there any plan to make >>> an API for immutable graphs? Graphs in banana-rdf are immutable by >>> default, and they are persistent in Plantain. We could always wrap an >>> immutable graph in a structure with a `var`, but, well, there are >>> better ways to do that. >> >> There has been suggestions along those lines. It is not a requirement >> of Graph now to allow .add() etc. - but there is no method to ask if a >> graph is mutable or not. >> >> In the user guide >> http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples >> we therefore say: >> >>> Note: Some Graph implementations are immutable, in which case the below may >>> throw an UnsupportedOperationException. >> >> >> We could probably add this to the Javadoc of the mutability methods of >> Graph with an explicit @throws. >> >> I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23 >> >> >> >> >> https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we >> should define immutability on the non-Graph objects. >> >> >> In Clerezza's Commons RDF Core (which is somewhat aligned with Commons >> RDF) there is an additional marker interface ImmutableGraph -- perhaps >> something along those lines would work here? > > If it doesn't exist in the type (i.e. statically), it's basically lost > knowledge. @throws, being silent or explicit, is pretty much useless > because now client code needs to check for this possibility. It would > be a slightly better to make it an interface. And we could could have > another interface for immutable graphs, where `add(Triple)` would > return another graph. > > See how it can be done in [1]. > >> https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java >> >> >>> `RDFTermFactory` is stateful just to accommodate >>> `createBlankNode(String)`. It's stateless otherwise. This is really an >>> issue for banana-rdf as everything is defined as pure function (the >>> output only depends on the input). >> >> It does not need to be stateful. >> >> In simple we implemented this using a final UUID "salt" that is >> created per instance of the factory. Do you consider this state? >> >> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51 >> >> >> This is then used by >> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41 >> as part of a hashing to generate the new uniqueReference(). Thus a >> second call on the same factory with the same name will be hashed with >> the same salt, and produce the same uniqueReference(), which makes the >> second BlankNode equal to the first. >> >> >> But you can achieve the contract by other non-stateful means, for >> instance a random UUID that is static final (and hence no state at all >> per factory instance), and you can create a uniqueReference() by >> concatenating that UUID with the System.identityHashCode() of the >> factory and concatenate the provided name. > > This approach only gives you the illusion that there is no state, but > there *is* one (e.g. with UUID, and the atomic counter). Because of > its current contract, `createBlankNode(String)` cannot be > referentially transparent, and this is an issue if one wants to take a > functional approach. > >> Also you are not required to implement createBlankNode(String) - you >> can simply throw UnsupportedOperationException and only support >> createBlankNode(). > > What is the point of doing/allowing that? As a users or implementors, > I want to know that I can rely on a method/function that is > accessible. And that is also why I dislike the default implementation > approach taken in the current draft. > >> This should probably be noted in the (yet so far just imagined) >> Implementors Guide on the Commons RDF website. >> >> https://issues.apache.org/jira/browse/COMMONSRDF-24 >> >> >>> Is `createBlankNode(String)` really needed? The internal map for >>> bnodes could be maintained _outside_ of the factory. Or at least, we >>> could pass it as an argument instead: `createBlankNode(Map<String, >>> BlankNode>, String)`. >> >> We did discuss if it was needed - there are arguments against it due >> to blank nodes "existing only as themselves" and therefore a single >> JVM object per BlankNode should be enough - however having the >> flexibility for say a streaming RDF parser to create identitcal blank >> node instances without keeping lots of object references felt like a >> compelling argument to support this through the factory - with for >> example the hashing method above this means no state is required. > > This is making *very structuring assumption*. Being referentially > transparent makes none, and will always accommodate all cases. That > being said, I understand the constraints. > > Note: [1] does not attempt to fix that issue. > >>> # wrapped values >>> >>> There are a lot of unnecessary objects because of the class hierarchy. >>> In banana-rdf, we can say that RDFTerm is a plain `String` while being >>> 100% type-safe. That's already what's happening for the N3.js >>> implementation. And in Plantain, Literals are just `java.lang.Object` >>> [6] so that we can directly have String, Int, etc. >> >> Well, this is tricky in Java where you can't force a new interface >> onto an existing type. >> >> How can you have a String as an RDFTerm? Because you use the >> ntriplestring? This would require new "instanceOf"-like methods to >> check what the string really is - and would require various classes >> like LiteralInspector to dissect the string. This sounds to me like >> building a different API.. > > The point is to abstract things away. By choosing to use actual > interfaces, you are forcing everything to be under a class hierarchy > for no good reason. I do not find the motivation for this approach in > the documentation. > > Please see [1] for a discussion on that subject. > >> While I can see this can be a valid way to model RDF in a non-OO way, >> I think that would be difficult to align with Commons RDF as a >> Java-focused API, where most Java programmers would expect type >> hierarchies represented as regular Java class hierarchies. > > I am not convinced. The RDF model is simple enough that another > approach is possible [1]. > >>> That means that there is no way currently to provide a >>> `RDFTermFactory` for Plantain. The only alternatives I see right now >>> are: >> >> What is the challenge of returning wrappers? I think this is the >> approach that Jena is also considering. >> >> Presumably if you are providing an RDFTermFactory then that is to >> allow JVM code that expects any Commons RDF code to create Plantain >> objects for RDF. They would expect to be able to do say: >> >> factory.createLiteral("Fred").getDatatype() >> >> which would not work on a returned String > > You can (of course) do things like that in Scala, in a very typesafe > way, i.e. it's not monkey patching. And this is happening in > banana-rdf :-) > > That would be totally compatible with [1]. > >>> # getTriples vs iterate >>> >>> Not a big deal but I believe the naming could be improved. When I read >>> getTriples, I expect to have all the triples in my hand, but this is >>> not quite what Streams are about. On the other hand, when I read >>> iterate, I kinda expect the opposite. Of course the types clarify >>> everything but I believe it'd be easier to use getTriplesAsStream and >>> getTriplesAsIterable. >> >> I might disagree - but I think this is a valuable to discuss. > > As I said, not a big deal. Types are what really matters in the end. > >> I have taken the liberty to report this in your name as: >> https://issues.apache.org/jira/browse/COMMONSRDF-22 >> >> so we can discuss this further in the email thread that should have >> triggered. >> >> >> >> Thanks for all your valuable suggestions - please do keep in touch and >> let us know if you have a go at aligning banana-rdf with the upcoming >> 0.1.0 release, further feedback on documentation, and anything else! > > I believe that the the main issue is that the current approach is both > for library users _and_ library authors, but there really are two > different targets here. I agree that the class for the RDF model can > provide a common framework for many Java people. But libraries relying > on commons-rdf should not be tied to the classes. > > Please have a look at this gist to see what I mean and tell me what > you think [1]. > > Alexandre > > [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb > >> >> -- >> Stian Soiland-Reyes >> Apache Taverna (incubating), Apache Commons RDF (incubating) >> http://orcid.org/0000-0001-9842-9718
