Re: General questions

Alexandre Bertails Tue, 14 Jul 2015 07:48:33 -0700

[snip]

>> No. It is very much like the `java.util.Collections.sort` method [1]:
>> it was written/compiled only once.
>
>
> That wasn't quite what I had in mind - binary compatibility allows moving
> objects between systems without copy if the receiving system does not
> wish/need to copy.  It's a choice of the receiver.


Oh, I thought "binary compatibility" just meant "no need to recompile" ;-)

> Injecting RDF<...> makes an algorithm independent of one base provider.

Exactly.  (but not sure about the use of the term "injecting" here)

> If you want to work with two base providers, for example, code that does
> system A to system B copy, you need RDF<A> and RDF<B> injected.  This is
> exactly your first point - common choices enables neutrality of containers.
> The "simple" commonsRDF, working on the interface objects can work with a
> mixture of origins.

Why would you want to work with two implementations at the same time?
Unless you explicitly want to go from one to another, of course.

> RDF<A> and RDF<B> are not compatible in the sense that a triple from one
> can't be put into a graph of the other; it needs unpacking from RDF<A> to
> the fundamentals (string for IRI etc) and repacking as RDF<B>.

Yes, it is the same thing than working with
`com.hp.hpl.jena.graph.Triple` and `org.openrdf.model.Statement`.

Does one need to go from one to another?

> (You can at least have a single converter lib,

Among other things, I want to avoid converters.

> because there are fundamental
> base units (Strings in various uses) but (Java-ism) the converter needs to
> be called, there being no implicit definitions.)

Sorry I didn't get that.

> Both styles have their uses.

Yes, I know.

banana-rdf has been providing an RDF abstraction for 4 years now, that
now accommodates for 5 implementations (Jena, Sesame, banana-plantain,
jsonld.js, N3.js) and we never felt the need for interfaces à la Java.

So I am genuinely trying to understand where and why people need those
interfaces, and how they will use them _in practice_ wrt the
underlying implementations.

Alexandre

>
>         Andy
>
>
>
>>
>> [1]
>> http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#sort-java.util.List-java.util.Comparator-
>>
>> In practice, three things would likely happen:
>>
>> 1. Jena, Sesame, banana-rdf, etc. would have to provide an
>> implementation of `RDF<...>` so that their implementations can be used
>> with any system accepting using the `RDF<...>` approach
>> 2. libraries that want to be abstract in the underlying RDF system
>> they work with (e.g. a Turtle parser/writer, a SPARQL client, etc.)
>> would have to be parametrized by `Graph`, `Triple`, etc.
>> 3. but libraries from 2. would likely offer modules of their APIs
>> already instantiated for the main RDF libraries (Jena, Sesame, etc.)
>> so that they are ready to use with such systems
>>
>> Best,
>> Alexandre
>>
>>>
>>>          Andy
>>>
>>>
>>> On 07/05/15 20:16, Alexandre Bertails wrote:
>>>>
>>>>
>>>> On Thu, May 7, 2015 at 11:18 AM, Alexandre Bertails
>>>> <[email protected]> wrote:
>>>>>
>>>>>
>>>>> Hi Stian,
>>>>>
>>>>> tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
>>>>
>>>>
>>>>
>>>> I updated the gist with three examples at the end: CommonsRDF,
>>>> StringRDF, PlantainRDF. Note how the types do not have to relate to
>>>> the current interfaces, but they can if you want/need to.
>>>>
>>>> Alexandre
>>>>
>>>>>
>>>>> On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> I haven't followed the development in a long time, especially after
>>>>>>> the move to Apache. I just looked at it and I had a few remarks and
>>>>>>> questions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi, thanks for joining us! Let's hope we haven't scared you away while
>>>>>> we try to get our first Apache release out and have the odd Blank Node
>>>>>> fight.. :)
>>>>>
>>>>>
>>>>>
>>>>> Wait, there was a Blank Node fight and I wasn't part of it?
>>>>>
>>>>>>> Just some background for those who don't already know me: I am part
>>>>>>> of
>>>>>>> the banana-rdf project [1]. The main approach there relies on
>>>>>>> defining
>>>>>>> RDF and its operations as a typeclass [2]. In that world, Jena and
>>>>>>> Sesame are just two instances of that typeclass (see for example [4]
>>>>>>> and [5]). So there is no wrapper involved. Still, commons-rdf is
>>>>>>> really a good step in the right direction as we could obsolete a lot
>>>>>>> of stuff.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I can definitely see that banana-rdf is relevant to commons-rdf - and
>>>>>> also any requirements you might have to commons-rdf coming from Scala
>>>>>> is interesting.
>>>>>
>>>>>
>>>>>
>>>>> That's cool if if we can accommodate commons-rdf to make it really
>>>>> useful from Scala-land. I think it is possible [1].
>>>>>
>>>>>>> Right now, there is no support in commons-rdf for immutable
>>>>>>> operations. `Graph`s are mutable by default. Is there any plan to
>>>>>>> make
>>>>>>> an API for immutable graphs? Graphs in banana-rdf are immutable by
>>>>>>> default, and they are persistent in Plantain. We could always wrap an
>>>>>>> immutable graph in a structure with a `var`, but, well, there are
>>>>>>> better ways to do that.
>>>>>>
>>>>>>
>>>>>>
>>>>>> There has been suggestions along those lines. It is not a requirement
>>>>>> of Graph now to allow .add() etc. - but there is no method to ask if a
>>>>>> graph is mutable or not.
>>>>>>
>>>>>> In the user guide
>>>>>> http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples
>>>>>> we therefore say:
>>>>>>
>>>>>>> Note: Some Graph implementations are immutable, in which case the
>>>>>>> below
>>>>>>> may throw an UnsupportedOperationException.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> We could probably add this to the Javadoc of the mutability methods of
>>>>>> Graph with an explicit @throws.
>>>>>>
>>>>>> I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we
>>>>>> should define immutability on the non-Graph objects.
>>>>>>
>>>>>>
>>>>>> In Clerezza's Commons RDF Core (which is somewhat aligned with Commons
>>>>>> RDF) there is an additional marker interface ImmutableGraph -- perhaps
>>>>>> something along those lines would work here?
>>>>>
>>>>>
>>>>>
>>>>> If it doesn't exist in the type (i.e. statically), it's basically lost
>>>>> knowledge. @throws, being silent or explicit, is pretty much useless
>>>>> because now client code needs to check for this possibility. It would
>>>>> be a slightly better to make it an interface. And we could could have
>>>>> another interface for immutable graphs, where `add(Triple)` would
>>>>> return another graph.
>>>>>
>>>>> See how it can be done in [1].
>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java
>>>>>>
>>>>>>
>>>>>>> `RDFTermFactory` is stateful just to accommodate
>>>>>>> `createBlankNode(String)`. It's stateless otherwise. This is really
>>>>>>> an
>>>>>>> issue for banana-rdf as everything is defined as pure function (the
>>>>>>> output only depends on the input).
>>>>>>
>>>>>>
>>>>>>
>>>>>> It does not need to be stateful.
>>>>>>
>>>>>> In simple we implemented this using a final UUID "salt" that is
>>>>>> created per instance of the factory. Do you consider this state?
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51
>>>>>>
>>>>>>
>>>>>> This is then used by
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
>>>>>> as part of a hashing to generate the new uniqueReference().  Thus a
>>>>>> second call on the same factory with the same name will be hashed with
>>>>>> the same salt, and produce the same uniqueReference(), which makes the
>>>>>> second BlankNode equal to the first.
>>>>>>
>>>>>>
>>>>>> But you can achieve the contract by other non-stateful means, for
>>>>>> instance a random UUID that is static final (and hence no state at all
>>>>>> per factory instance), and you can create a uniqueReference() by
>>>>>> concatenating that UUID with the System.identityHashCode() of the
>>>>>> factory and concatenate the provided name.
>>>>>
>>>>>
>>>>>
>>>>> This approach only gives you the illusion that there is no state, but
>>>>> there *is* one (e.g. with UUID, and the atomic counter). Because of
>>>>> its current contract, `createBlankNode(String)` cannot be
>>>>> referentially transparent, and this is an issue if one wants to take a
>>>>> functional approach.
>>>>>
>>>>>> Also you are not required to implement createBlankNode(String) - you
>>>>>> can simply throw UnsupportedOperationException and only support
>>>>>> createBlankNode().
>>>>>
>>>>>
>>>>>
>>>>> What is the point of doing/allowing that? As a users or implementors,
>>>>> I want to know that I can rely on a method/function that is
>>>>> accessible. And that is also why I dislike the default implementation
>>>>> approach taken in the current draft.
>>>>>
>>>>>> This should probably be noted in the (yet so far just imagined)
>>>>>> Implementors Guide on the Commons RDF website.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/COMMONSRDF-24
>>>>>>
>>>>>>
>>>>>>> Is `createBlankNode(String)` really needed? The internal map for
>>>>>>> bnodes could be maintained _outside_ of the factory. Or at least, we
>>>>>>> could pass it as an argument instead: `createBlankNode(Map<String,
>>>>>>> BlankNode>, String)`.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We did discuss if it was needed - there are arguments against it due
>>>>>> to blank nodes "existing only as themselves" and therefore a single
>>>>>> JVM object per BlankNode should be enough - however having the
>>>>>> flexibility for say a streaming RDF parser to create identitcal blank
>>>>>> node instances without keeping lots of object references felt like a
>>>>>> compelling argument to support this through the factory -  with for
>>>>>> example the hashing method above this means no state is required.
>>>>>
>>>>>
>>>>>
>>>>> This is making *very structuring assumption*. Being referentially
>>>>> transparent makes none, and will always accommodate all cases. That
>>>>> being said, I understand the constraints.
>>>>>
>>>>> Note: [1] does not attempt to fix that issue.
>>>>>
>>>>>>> # wrapped values
>>>>>>>
>>>>>>> There are a lot of unnecessary objects because of the class
>>>>>>> hierarchy.
>>>>>>> In banana-rdf, we can say that RDFTerm is a plain `String` while
>>>>>>> being
>>>>>>> 100% type-safe. That's already what's happening for the N3.js
>>>>>>> implementation. And in Plantain, Literals are just `java.lang.Object`
>>>>>>> [6] so that we can directly have String, Int, etc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Well, this is tricky in Java where you can't force a new interface
>>>>>> onto an existing type.
>>>>>>
>>>>>> How can you have a String as an RDFTerm? Because you use the
>>>>>> ntriplestring? This would require new "instanceOf"-like methods to
>>>>>> check what the string really is - and would require various classes
>>>>>> like LiteralInspector to dissect the string. This sounds to me like
>>>>>> building a different API..
>>>>>
>>>>>
>>>>>
>>>>> The point is to abstract things away. By choosing to use actual
>>>>> interfaces, you are forcing everything to be under a class hierarchy
>>>>> for no good reason. I do not find the motivation for this approach in
>>>>> the documentation.
>>>>>
>>>>> Please see [1] for a discussion on that subject.
>>>>>
>>>>>> While I can see this can be a valid way to model RDF in a non-OO way,
>>>>>> I  think that would be difficult to align with Commons RDF as a
>>>>>> Java-focused API, where most Java programmers would expect type
>>>>>> hierarchies represented as regular Java class hierarchies.
>>>>>
>>>>>
>>>>>
>>>>> I am not convinced. The RDF model is simple enough that another
>>>>> approach is possible [1].
>>>>>
>>>>>>> That means that there is no way currently to provide a
>>>>>>> `RDFTermFactory` for Plantain. The only alternatives I see right now
>>>>>>> are:
>>>>>>
>>>>>>
>>>>>>
>>>>>> What is the challenge of returning wrappers? I think this is the
>>>>>> approach that Jena is also considering.
>>>>>>
>>>>>> Presumably if you are providing an RDFTermFactory then that is to
>>>>>> allow JVM code that expects any Commons RDF code to create Plantain
>>>>>> objects for RDF. They would expect to be able to do say:
>>>>>>
>>>>>> factory.createLiteral("Fred").getDatatype()
>>>>>>
>>>>>> which would not work on a returned String
>>>>>
>>>>>
>>>>>
>>>>> You can (of course) do things like that in Scala, in a very typesafe
>>>>> way, i.e. it's not monkey patching. And this is happening in
>>>>> banana-rdf :-)
>>>>>
>>>>> That would be totally compatible with [1].
>>>>>
>>>>>>> # getTriples vs iterate
>>>>>>>
>>>>>>> Not a big deal but I believe the naming could be improved. When I
>>>>>>> read
>>>>>>> getTriples, I expect to have all the triples in my hand, but this is
>>>>>>> not quite what Streams are about. On the other hand, when I read
>>>>>>> iterate, I kinda expect the opposite. Of course the types clarify
>>>>>>> everything but I believe it'd be easier to use getTriplesAsStream and
>>>>>>> getTriplesAsIterable.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I might disagree - but I think this is a valuable to discuss.
>>>>>
>>>>>
>>>>>
>>>>> As I said, not a big deal. Types are what really matters in the end.
>>>>>
>>>>>> I have taken the liberty to report this in your name as:
>>>>>> https://issues.apache.org/jira/browse/COMMONSRDF-22
>>>>>>
>>>>>> so we can discuss this further in the email thread that should have
>>>>>> triggered.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for all your valuable suggestions - please do keep in touch and
>>>>>> let us know if you have a go at aligning banana-rdf with the upcoming
>>>>>> 0.1.0 release, further feedback on documentation, and anything else!
>>>>>
>>>>>
>>>>>
>>>>> I believe that the the main issue is that the current approach is both
>>>>> for library users _and_ library authors, but there really are two
>>>>> different targets here. I agree that the class for the RDF model can
>>>>> provide a common framework for many Java people. But libraries relying
>>>>> on commons-rdf should not be tied to the classes.
>>>>>
>>>>> Please have a look at this gist to see what I mean and tell me what
>>>>> you think [1].
>>>>>
>>>>> Alexandre
>>>>>
>>>>> [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
>>>>>
>>>>>>
>>>>>> --
>>>>>> Stian Soiland-Reyes
>>>>>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>>>>>> http://orcid.org/0000-0001-9842-9718
>>>
>>>
>>>
>

Re: General questions

Reply via email to