On Thu, May 7, 2015 at 11:18 AM, Alexandre Bertails
<[email protected]> wrote:
> Hi Stian,
>
> tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb

I updated the gist with three examples at the end: CommonsRDF,
StringRDF, PlantainRDF. Note how the types do not have to relate to
the current interfaces, but they can if you want/need to.

Alexandre

>
> On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]> wrote:
>> On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote:
>>> I haven't followed the development in a long time, especially after
>>> the move to Apache. I just looked at it and I had a few remarks and
>>> questions.
>>
>> Hi, thanks for joining us! Let's hope we haven't scared you away while
>> we try to get our first Apache release out and have the odd Blank Node
>> fight.. :)
>
> Wait, there was a Blank Node fight and I wasn't part of it?
>
>>> Just some background for those who don't already know me: I am part of
>>> the banana-rdf project [1]. The main approach there relies on defining
>>> RDF and its operations as a typeclass [2]. In that world, Jena and
>>> Sesame are just two instances of that typeclass (see for example [4]
>>> and [5]). So there is no wrapper involved. Still, commons-rdf is
>>> really a good step in the right direction as we could obsolete a lot
>>> of stuff.
>>
>> I can definitely see that banana-rdf is relevant to commons-rdf - and
>> also any requirements you might have to commons-rdf coming from Scala
>> is interesting.
>
> That's cool if if we can accommodate commons-rdf to make it really
> useful from Scala-land. I think it is possible [1].
>
>>> Right now, there is no support in commons-rdf for immutable
>>> operations. `Graph`s are mutable by default. Is there any plan to make
>>> an API for immutable graphs? Graphs in banana-rdf are immutable by
>>> default, and they are persistent in Plantain. We could always wrap an
>>> immutable graph in a structure with a `var`, but, well, there are
>>> better ways to do that.
>>
>> There has been suggestions along those lines. It is not a requirement
>> of Graph now to allow .add() etc. - but there is no method to ask if a
>> graph is mutable or not.
>>
>> In the user guide
>> http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples
>> we therefore say:
>>
>>> Note: Some Graph implementations are immutable, in which case the below may 
>>> throw an UnsupportedOperationException.
>>
>>
>> We could probably add this to the Javadoc of the mutability methods of
>> Graph with an explicit @throws.
>>
>> I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23
>>
>>
>>
>>
>> https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we
>> should define immutability on the non-Graph objects.
>>
>>
>> In Clerezza's Commons RDF Core (which is somewhat aligned with Commons
>> RDF) there is an additional marker interface ImmutableGraph -- perhaps
>> something along those lines would work here?
>
> If it doesn't exist in the type (i.e. statically), it's basically lost
> knowledge. @throws, being silent or explicit, is pretty much useless
> because now client code needs to check for this possibility. It would
> be a slightly better to make it an interface. And we could could have
> another interface for immutable graphs, where `add(Triple)` would
> return another graph.
>
> See how it can be done in [1].
>
>> https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java
>>
>>
>>> `RDFTermFactory` is stateful just to accommodate
>>> `createBlankNode(String)`. It's stateless otherwise. This is really an
>>> issue for banana-rdf as everything is defined as pure function (the
>>> output only depends on the input).
>>
>> It does not need to be stateful.
>>
>> In simple we implemented this using a final UUID "salt" that is
>> created per instance of the factory. Do you consider this state?
>>
>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51
>>
>>
>> This is then used by
>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
>> as part of a hashing to generate the new uniqueReference().  Thus a
>> second call on the same factory with the same name will be hashed with
>> the same salt, and produce the same uniqueReference(), which makes the
>> second BlankNode equal to the first.
>>
>>
>> But you can achieve the contract by other non-stateful means, for
>> instance a random UUID that is static final (and hence no state at all
>> per factory instance), and you can create a uniqueReference() by
>> concatenating that UUID with the System.identityHashCode() of the
>> factory and concatenate the provided name.
>
> This approach only gives you the illusion that there is no state, but
> there *is* one (e.g. with UUID, and the atomic counter). Because of
> its current contract, `createBlankNode(String)` cannot be
> referentially transparent, and this is an issue if one wants to take a
> functional approach.
>
>> Also you are not required to implement createBlankNode(String) - you
>> can simply throw UnsupportedOperationException and only support
>> createBlankNode().
>
> What is the point of doing/allowing that? As a users or implementors,
> I want to know that I can rely on a method/function that is
> accessible. And that is also why I dislike the default implementation
> approach taken in the current draft.
>
>> This should probably be noted in the (yet so far just imagined)
>> Implementors Guide on the Commons RDF website.
>>
>> https://issues.apache.org/jira/browse/COMMONSRDF-24
>>
>>
>>> Is `createBlankNode(String)` really needed? The internal map for
>>> bnodes could be maintained _outside_ of the factory. Or at least, we
>>> could pass it as an argument instead: `createBlankNode(Map<String,
>>> BlankNode>, String)`.
>>
>> We did discuss if it was needed - there are arguments against it due
>> to blank nodes "existing only as themselves" and therefore a single
>> JVM object per BlankNode should be enough - however having the
>> flexibility for say a streaming RDF parser to create identitcal blank
>> node instances without keeping lots of object references felt like a
>> compelling argument to support this through the factory -  with for
>> example the hashing method above this means no state is required.
>
> This is making *very structuring assumption*. Being referentially
> transparent makes none, and will always accommodate all cases. That
> being said, I understand the constraints.
>
> Note: [1] does not attempt to fix that issue.
>
>>> # wrapped values
>>>
>>> There are a lot of unnecessary objects because of the class hierarchy.
>>> In banana-rdf, we can say that RDFTerm is a plain `String` while being
>>> 100% type-safe. That's already what's happening for the N3.js
>>> implementation. And in Plantain, Literals are just `java.lang.Object`
>>> [6] so that we can directly have String, Int, etc.
>>
>> Well, this is tricky in Java where you can't force a new interface
>> onto an existing type.
>>
>> How can you have a String as an RDFTerm? Because you use the
>> ntriplestring? This would require new "instanceOf"-like methods to
>> check what the string really is - and would require various classes
>> like LiteralInspector to dissect the string. This sounds to me like
>> building a different API..
>
> The point is to abstract things away. By choosing to use actual
> interfaces, you are forcing everything to be under a class hierarchy
> for no good reason. I do not find the motivation for this approach in
> the documentation.
>
> Please see [1] for a discussion on that subject.
>
>> While I can see this can be a valid way to model RDF in a non-OO way,
>> I  think that would be difficult to align with Commons RDF as a
>> Java-focused API, where most Java programmers would expect type
>> hierarchies represented as regular Java class hierarchies.
>
> I am not convinced. The RDF model is simple enough that another
> approach is possible [1].
>
>>> That means that there is no way currently to provide a
>>> `RDFTermFactory` for Plantain. The only alternatives I see right now
>>> are:
>>
>> What is the challenge of returning wrappers? I think this is the
>> approach that Jena is also considering.
>>
>> Presumably if you are providing an RDFTermFactory then that is to
>> allow JVM code that expects any Commons RDF code to create Plantain
>> objects for RDF. They would expect to be able to do say:
>>
>> factory.createLiteral("Fred").getDatatype()
>>
>> which would not work on a returned String
>
> You can (of course) do things like that in Scala, in a very typesafe
> way, i.e. it's not monkey patching. And this is happening in
> banana-rdf :-)
>
> That would be totally compatible with [1].
>
>>> # getTriples vs iterate
>>>
>>> Not a big deal but I believe the naming could be improved. When I read
>>> getTriples, I expect to have all the triples in my hand, but this is
>>> not quite what Streams are about. On the other hand, when I read
>>> iterate, I kinda expect the opposite. Of course the types clarify
>>> everything but I believe it'd be easier to use getTriplesAsStream and
>>> getTriplesAsIterable.
>>
>> I might disagree - but I think this is a valuable to discuss.
>
> As I said, not a big deal. Types are what really matters in the end.
>
>> I have taken the liberty to report this in your name as:
>> https://issues.apache.org/jira/browse/COMMONSRDF-22
>>
>> so we can discuss this further in the email thread that should have 
>> triggered.
>>
>>
>>
>> Thanks for all your valuable suggestions - please do keep in touch and
>> let us know if you have a go at aligning banana-rdf with the upcoming
>> 0.1.0 release, further feedback on documentation, and anything else!
>
> I believe that the the main issue is that the current approach is both
> for library users _and_ library authors, but there really are two
> different targets here. I agree that the class for the RDF model can
> provide a common framework for many Java people. But libraries relying
> on commons-rdf should not be tied to the classes.
>
> Please have a look at this gist to see what I mean and tell me what
> you think [1].
>
> Alexandre
>
> [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
>
>>
>> --
>> Stian Soiland-Reyes
>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> http://orcid.org/0000-0001-9842-9718

Reply via email to