Re: General questions

Andy Seaborne Tue, 14 Jul 2015 03:25:25 -0700

On 08/07/15 04:42, Alexandre Bertails wrote:

Hi Andy,


On Sun, Jul 5, 2015 at 8:23 AM, Andy Seaborne <[email protected]> wrote:

Alexandre,

Interesting technical peice of work.  So this approach is that different
choices of the generic parameters are made for different target systems?


Yes. Note that the types alone are useless: you need an `RDF<...>`
implementation to be available along with those types.

  To
retarget a different base system means a different choice so it's
source-level compatible, not binary.


No, this is still binary compatible as you never need to recompile.
It's just that you keep much more static information about the types
being carried:

   static binding + abstract types  VS  dynamic dispatch using interfaces.

  Code written using RDF<...> needs to recompiled for different systems?


No. It is very much like the `java.util.Collections.sort` method [1]:
it was written/compiled only once.

That wasn't quite what I had in mind - binary compatibility allowsmoving objects between systems without copy if the receiving system doesnot wish/need to copy. It's a choice of the receiver.


Injecting RDF<...> makes an algorithm independent of one base provider.

If you want to work with two base providers, for example, code that doessystem A to system B copy, you need RDF<A> and RDF<B> injected. This isexactly your first point - common choices enables neutrality ofcontainers. The "simple" commonsRDF, working on the interface objectscan work with a mixture of origins.

RDF<A> and RDF<B> are not compatible in the sense that a triple from onecan't be put into a graph of the other; it needs unpacking from RDF<A>to the fundamentals (string for IRI etc) and repacking as RDF<B>.

(You can at least have a single converter lib, because there arefundamental base units (Strings in various uses) but (Java-ism) theconverter needs to be called, there being no implicit definitions.)


Both styles have their uses.

        Andy


[1] 
http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#sort-java.util.List-java.util.Comparator-

In practice, three things would likely happen:

1. Jena, Sesame, banana-rdf, etc. would have to provide an
implementation of `RDF<...>` so that their implementations can be used
with any system accepting using the `RDF<...>` approach
2. libraries that want to be abstract in the underlying RDF system
they work with (e.g. a Turtle parser/writer, a SPARQL client, etc.)
would have to be parametrized by `Graph`, `Triple`, etc.
3. but libraries from 2. would likely offer modules of their APIs
already instantiated for the main RDF libraries (Jena, Sesame, etc.)
so that they are ready to use with such systems

Best,
Alexandre


         Andy


On 07/05/15 20:16, Alexandre Bertails wrote:


On Thu, May 7, 2015 at 11:18 AM, Alexandre Bertails
<[email protected]> wrote:


Hi Stian,

tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb



I updated the gist with three examples at the end: CommonsRDF,
StringRDF, PlantainRDF. Note how the types do not have to relate to
the current interfaces, but they can if you want/need to.

Alexandre


On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]>
wrote:


On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote:


I haven't followed the development in a long time, especially after
the move to Apache. I just looked at it and I had a few remarks and
questions.



Hi, thanks for joining us! Let's hope we haven't scared you away while
we try to get our first Apache release out and have the odd Blank Node
fight.. :)



Wait, there was a Blank Node fight and I wasn't part of it?

Just some background for those who don't already know me: I am part of
the banana-rdf project [1]. The main approach there relies on defining
RDF and its operations as a typeclass [2]. In that world, Jena and
Sesame are just two instances of that typeclass (see for example [4]
and [5]). So there is no wrapper involved. Still, commons-rdf is
really a good step in the right direction as we could obsolete a lot
of stuff.



I can definitely see that banana-rdf is relevant to commons-rdf - and
also any requirements you might have to commons-rdf coming from Scala
is interesting.



That's cool if if we can accommodate commons-rdf to make it really
useful from Scala-land. I think it is possible [1].

Right now, there is no support in commons-rdf for immutable
operations. `Graph`s are mutable by default. Is there any plan to make
an API for immutable graphs? Graphs in banana-rdf are immutable by
default, and they are persistent in Plantain. We could always wrap an
immutable graph in a structure with a `var`, but, well, there are
better ways to do that.



There has been suggestions along those lines. It is not a requirement
of Graph now to allow .add() etc. - but there is no method to ask if a
graph is mutable or not.

In the user guide
http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples
we therefore say:

Note: Some Graph implementations are immutable, in which case the below
may throw an UnsupportedOperationException.




We could probably add this to the Javadoc of the mutability methods of
Graph with an explicit @throws.

I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23




https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we
should define immutability on the non-Graph objects.


In Clerezza's Commons RDF Core (which is somewhat aligned with Commons
RDF) there is an additional marker interface ImmutableGraph -- perhaps
something along those lines would work here?



If it doesn't exist in the type (i.e. statically), it's basically lost
knowledge. @throws, being silent or explicit, is pretty much useless
because now client code needs to check for this possibility. It would
be a slightly better to make it an interface. And we could could have
another interface for immutable graphs, where `add(Triple)` would
return another graph.

See how it can be done in [1].


https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java

`RDFTermFactory` is stateful just to accommodate
`createBlankNode(String)`. It's stateless otherwise. This is really an
issue for banana-rdf as everything is defined as pure function (the
output only depends on the input).



It does not need to be stateful.

In simple we implemented this using a final UUID "salt" that is
created per instance of the factory. Do you consider this state?


https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51


This is then used by

https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
as part of a hashing to generate the new uniqueReference().  Thus a
second call on the same factory with the same name will be hashed with
the same salt, and produce the same uniqueReference(), which makes the
second BlankNode equal to the first.


But you can achieve the contract by other non-stateful means, for
instance a random UUID that is static final (and hence no state at all
per factory instance), and you can create a uniqueReference() by
concatenating that UUID with the System.identityHashCode() of the
factory and concatenate the provided name.



This approach only gives you the illusion that there is no state, but
there *is* one (e.g. with UUID, and the atomic counter). Because of
its current contract, `createBlankNode(String)` cannot be
referentially transparent, and this is an issue if one wants to take a
functional approach.

Also you are not required to implement createBlankNode(String) - you
can simply throw UnsupportedOperationException and only support
createBlankNode().



What is the point of doing/allowing that? As a users or implementors,
I want to know that I can rely on a method/function that is
accessible. And that is also why I dislike the default implementation
approach taken in the current draft.

This should probably be noted in the (yet so far just imagined)
Implementors Guide on the Commons RDF website.

https://issues.apache.org/jira/browse/COMMONSRDF-24

Is `createBlankNode(String)` really needed? The internal map for
bnodes could be maintained _outside_ of the factory. Or at least, we
could pass it as an argument instead: `createBlankNode(Map<String,
BlankNode>, String)`.



We did discuss if it was needed - there are arguments against it due
to blank nodes "existing only as themselves" and therefore a single
JVM object per BlankNode should be enough - however having the
flexibility for say a streaming RDF parser to create identitcal blank
node instances without keeping lots of object references felt like a
compelling argument to support this through the factory -  with for
example the hashing method above this means no state is required.



This is making *very structuring assumption*. Being referentially
transparent makes none, and will always accommodate all cases. That
being said, I understand the constraints.

Note: [1] does not attempt to fix that issue.

# wrapped values

There are a lot of unnecessary objects because of the class hierarchy.
In banana-rdf, we can say that RDFTerm is a plain `String` while being
100% type-safe. That's already what's happening for the N3.js
implementation. And in Plantain, Literals are just `java.lang.Object`
[6] so that we can directly have String, Int, etc.



Well, this is tricky in Java where you can't force a new interface
onto an existing type.

How can you have a String as an RDFTerm? Because you use the
ntriplestring? This would require new "instanceOf"-like methods to
check what the string really is - and would require various classes
like LiteralInspector to dissect the string. This sounds to me like
building a different API..



The point is to abstract things away. By choosing to use actual
interfaces, you are forcing everything to be under a class hierarchy
for no good reason. I do not find the motivation for this approach in
the documentation.

Please see [1] for a discussion on that subject.

While I can see this can be a valid way to model RDF in a non-OO way,
I  think that would be difficult to align with Commons RDF as a
Java-focused API, where most Java programmers would expect type
hierarchies represented as regular Java class hierarchies.



I am not convinced. The RDF model is simple enough that another
approach is possible [1].

That means that there is no way currently to provide a
`RDFTermFactory` for Plantain. The only alternatives I see right now
are:



What is the challenge of returning wrappers? I think this is the
approach that Jena is also considering.

Presumably if you are providing an RDFTermFactory then that is to
allow JVM code that expects any Commons RDF code to create Plantain
objects for RDF. They would expect to be able to do say:

factory.createLiteral("Fred").getDatatype()

which would not work on a returned String



You can (of course) do things like that in Scala, in a very typesafe
way, i.e. it's not monkey patching. And this is happening in
banana-rdf :-)

That would be totally compatible with [1].

# getTriples vs iterate

Not a big deal but I believe the naming could be improved. When I read
getTriples, I expect to have all the triples in my hand, but this is
not quite what Streams are about. On the other hand, when I read
iterate, I kinda expect the opposite. Of course the types clarify
everything but I believe it'd be easier to use getTriplesAsStream and
getTriplesAsIterable.



I might disagree - but I think this is a valuable to discuss.



As I said, not a big deal. Types are what really matters in the end.

I have taken the liberty to report this in your name as:
https://issues.apache.org/jira/browse/COMMONSRDF-22

so we can discuss this further in the email thread that should have
triggered.



Thanks for all your valuable suggestions - please do keep in touch and
let us know if you have a go at aligning banana-rdf with the upcoming
0.1.0 release, further feedback on documentation, and anything else!



I believe that the the main issue is that the current approach is both
for library users _and_ library authors, but there really are two
different targets here. I agree that the class for the RDF model can
provide a common framework for many Java people. But libraries relying
on commons-rdf should not be tied to the classes.

Please have a look at this gist to see what I mean and tell me what
you think [1].

Alexandre

[1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb


--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718

Re: General questions

Reply via email to