Re: General questions

Alexandre Bertails Thu, 07 May 2015 11:19:26 -0700

Hi Stian,

tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb


On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]> wrote:
> On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote:
>> I haven't followed the development in a long time, especially after
>> the move to Apache. I just looked at it and I had a few remarks and
>> questions.
>
> Hi, thanks for joining us! Let's hope we haven't scared you away while
> we try to get our first Apache release out and have the odd Blank Node
> fight.. :)

Wait, there was a Blank Node fight and I wasn't part of it?

>> Just some background for those who don't already know me: I am part of
>> the banana-rdf project [1]. The main approach there relies on defining
>> RDF and its operations as a typeclass [2]. In that world, Jena and
>> Sesame are just two instances of that typeclass (see for example [4]
>> and [5]). So there is no wrapper involved. Still, commons-rdf is
>> really a good step in the right direction as we could obsolete a lot
>> of stuff.
>
> I can definitely see that banana-rdf is relevant to commons-rdf - and
> also any requirements you might have to commons-rdf coming from Scala
> is interesting.

That's cool if if we can accommodate commons-rdf to make it really
useful from Scala-land. I think it is possible [1].

>> Right now, there is no support in commons-rdf for immutable
>> operations. `Graph`s are mutable by default. Is there any plan to make
>> an API for immutable graphs? Graphs in banana-rdf are immutable by
>> default, and they are persistent in Plantain. We could always wrap an
>> immutable graph in a structure with a `var`, but, well, there are
>> better ways to do that.
>
> There has been suggestions along those lines. It is not a requirement
> of Graph now to allow .add() etc. - but there is no method to ask if a
> graph is mutable or not.
>
> In the user guide
> http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples
> we therefore say:
>
>> Note: Some Graph implementations are immutable, in which case the below may 
>> throw an UnsupportedOperationException.
>
>
> We could probably add this to the Javadoc of the mutability methods of
> Graph with an explicit @throws.
>
> I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23
>
>
>
>
> https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we
> should define immutability on the non-Graph objects.
>
>
> In Clerezza's Commons RDF Core (which is somewhat aligned with Commons
> RDF) there is an additional marker interface ImmutableGraph -- perhaps
> something along those lines would work here?

If it doesn't exist in the type (i.e. statically), it's basically lost
knowledge. @throws, being silent or explicit, is pretty much useless
because now client code needs to check for this possibility. It would
be a slightly better to make it an interface. And we could could have
another interface for immutable graphs, where `add(Triple)` would
return another graph.

See how it can be done in [1].

> https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java
>
>
>> `RDFTermFactory` is stateful just to accommodate
>> `createBlankNode(String)`. It's stateless otherwise. This is really an
>> issue for banana-rdf as everything is defined as pure function (the
>> output only depends on the input).
>
> It does not need to be stateful.
>
> In simple we implemented this using a final UUID "salt" that is
> created per instance of the factory. Do you consider this state?
>
> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51
>
>
> This is then used by
> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
> as part of a hashing to generate the new uniqueReference().  Thus a
> second call on the same factory with the same name will be hashed with
> the same salt, and produce the same uniqueReference(), which makes the
> second BlankNode equal to the first.
>
>
> But you can achieve the contract by other non-stateful means, for
> instance a random UUID that is static final (and hence no state at all
> per factory instance), and you can create a uniqueReference() by
> concatenating that UUID with the System.identityHashCode() of the
> factory and concatenate the provided name.

This approach only gives you the illusion that there is no state, but
there *is* one (e.g. with UUID, and the atomic counter). Because of
its current contract, `createBlankNode(String)` cannot be
referentially transparent, and this is an issue if one wants to take a
functional approach.

> Also you are not required to implement createBlankNode(String) - you
> can simply throw UnsupportedOperationException and only support
> createBlankNode().

What is the point of doing/allowing that? As a users or implementors,
I want to know that I can rely on a method/function that is
accessible. And that is also why I dislike the default implementation
approach taken in the current draft.

> This should probably be noted in the (yet so far just imagined)
> Implementors Guide on the Commons RDF website.
>
> https://issues.apache.org/jira/browse/COMMONSRDF-24
>
>
>> Is `createBlankNode(String)` really needed? The internal map for
>> bnodes could be maintained _outside_ of the factory. Or at least, we
>> could pass it as an argument instead: `createBlankNode(Map<String,
>> BlankNode>, String)`.
>
> We did discuss if it was needed - there are arguments against it due
> to blank nodes "existing only as themselves" and therefore a single
> JVM object per BlankNode should be enough - however having the
> flexibility for say a streaming RDF parser to create identitcal blank
> node instances without keeping lots of object references felt like a
> compelling argument to support this through the factory -  with for
> example the hashing method above this means no state is required.

This is making *very structuring assumption*. Being referentially
transparent makes none, and will always accommodate all cases. That
being said, I understand the constraints.

Note: [1] does not attempt to fix that issue.

>> # wrapped values
>>
>> There are a lot of unnecessary objects because of the class hierarchy.
>> In banana-rdf, we can say that RDFTerm is a plain `String` while being
>> 100% type-safe. That's already what's happening for the N3.js
>> implementation. And in Plantain, Literals are just `java.lang.Object`
>> [6] so that we can directly have String, Int, etc.
>
> Well, this is tricky in Java where you can't force a new interface
> onto an existing type.
>
> How can you have a String as an RDFTerm? Because you use the
> ntriplestring? This would require new "instanceOf"-like methods to
> check what the string really is - and would require various classes
> like LiteralInspector to dissect the string. This sounds to me like
> building a different API..

The point is to abstract things away. By choosing to use actual
interfaces, you are forcing everything to be under a class hierarchy
for no good reason. I do not find the motivation for this approach in
the documentation.

Please see [1] for a discussion on that subject.

> While I can see this can be a valid way to model RDF in a non-OO way,
> I  think that would be difficult to align with Commons RDF as a
> Java-focused API, where most Java programmers would expect type
> hierarchies represented as regular Java class hierarchies.

I am not convinced. The RDF model is simple enough that another
approach is possible [1].

>> That means that there is no way currently to provide a
>> `RDFTermFactory` for Plantain. The only alternatives I see right now
>> are:
>
> What is the challenge of returning wrappers? I think this is the
> approach that Jena is also considering.
>
> Presumably if you are providing an RDFTermFactory then that is to
> allow JVM code that expects any Commons RDF code to create Plantain
> objects for RDF. They would expect to be able to do say:
>
> factory.createLiteral("Fred").getDatatype()
>
> which would not work on a returned String

You can (of course) do things like that in Scala, in a very typesafe
way, i.e. it's not monkey patching. And this is happening in
banana-rdf :-)

That would be totally compatible with [1].

>> # getTriples vs iterate
>>
>> Not a big deal but I believe the naming could be improved. When I read
>> getTriples, I expect to have all the triples in my hand, but this is
>> not quite what Streams are about. On the other hand, when I read
>> iterate, I kinda expect the opposite. Of course the types clarify
>> everything but I believe it'd be easier to use getTriplesAsStream and
>> getTriplesAsIterable.
>
> I might disagree - but I think this is a valuable to discuss.

As I said, not a big deal. Types are what really matters in the end.

> I have taken the liberty to report this in your name as:
> https://issues.apache.org/jira/browse/COMMONSRDF-22
>
> so we can discuss this further in the email thread that should have triggered.
>
>
>
> Thanks for all your valuable suggestions - please do keep in touch and
> let us know if you have a go at aligning banana-rdf with the upcoming
> 0.1.0 release, further feedback on documentation, and anything else!

I believe that the the main issue is that the current approach is both
for library users _and_ library authors, but there really are two
different targets here. I agree that the class for the RDF model can
provide a common framework for many Java people. But libraries relying
on commons-rdf should not be tied to the classes.

Please have a look at this gist to see what I mean and tell me what
you think [1].

Alexandre

[1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb

>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons RDF (incubating)
> http://orcid.org/0000-0001-9842-9718

Re: General questions

Reply via email to