Re: General questions

Paul Houle Sun, 05 Jul 2015 17:11:41 -0700

Hey guys,  I have done some work on a fork of the commons-rdf API which can
be best understood,  I think,  by looking at this source file


https://github.com/paulhoule/incubator-commonsrdf/blob/master/api/src/main/java/org/apache/commons/rdf/api/RDFContext.java

I've held off on talking about it because it isn't really "done" yet -- in
particular I haven't fully tested the dynamic methods (Note that the
RDFContext used to be called an RDFFactory) and there certainly are more
methods to implement in order to get consistent behavior everywhere.

The library is idiomatic to modern java,  and is particularly designed to
take advantage of static types where compile-time polymorphism applies,
 but it has a "Dynamic" method that will automatically handle many common
types the way somebody would expect them to be handled.

Now there are many many many details to be concerned about.  Representation
of time is a pet peeve of mine and I am not even talking about the Allen
Algebra or getting anywhere close to the "real" ISO 8601,  but rather
handle the garden variety cases where you have an xsd:date or xsd:time.

My decision here has been to use the JDK 8 date libraries,  but there is
still the issue that there are date-time types that exist in XSD that don't
exist in java.time.  I think you could add these as new types that fit into
that system,  but I haven't done it yet.

All of the convenience methods are implemented as default methods so they
don't need to be implemented in implementations.  If you do implement them,
 however,  there is the possibility of the implementation using primitive
types and other implementation tricks that avoid serialization and object
creation.

One area where I am not so sure I am doing the right thing is with numeric
types.  For instance,  personally I am getting in the habit of using longs
in a lot of places where I used to use ints because sometimes I run a job
and find out that it failed because the ints rolled over.  On one hand I
like the idea that you could say

createLiteral((byte) 3)

and get an xsd:byte,  frankly I wonder how well interoperability is with
all the RDF tools out there is if you you use unusual types.  Also I am
personally a big fan of bigints and bigdecimals (i.e. money)

One idea I have is that different RDFContexts could implement different
behaviors as to what types they use,  but that certainly complicates the
text and might drive people crazy when things don't work.

Another feature of this implementation is that every "RDF Object" contains
a pointer back to the RDFContext that created it.  If you really want to
save bytes,  the RDFContext can be made a singleton and then the reference
kept in a static field.





On Sun, Jul 5, 2015 at 11:23 AM, Andy Seaborne <[email protected]> wrote:

> Alexandre,
>
> Interesting technical peice of work.  So this approach is that different
> choices of the generic parameters are made for different target systems?
> To retarget a different base system means a different choice so it's
> source-level compatible, not binary.  Code written using RDF<...> needs to
> recompiled for different systems?
>
>         Andy
>
> On 07/05/15 20:16, Alexandre Bertails wrote:
>
>> On Thu, May 7, 2015 at 11:18 AM, Alexandre Bertails
>> <[email protected]> wrote:
>>
>>> Hi Stian,
>>>
>>> tldr: [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
>>>
>>
>> I updated the gist with three examples at the end: CommonsRDF,
>> StringRDF, PlantainRDF. Note how the types do not have to relate to
>> the current interfaces, but they can if you want/need to.
>>
>> Alexandre
>>
>>
>>> On Wed, May 6, 2015 at 4:24 PM, Stian Soiland-Reyes <[email protected]>
>>> wrote:
>>>
>>>> On 6 May 2015 at 05:58, Alexandre Bertails <[email protected]> wrote:
>>>>
>>>>> I haven't followed the development in a long time, especially after
>>>>> the move to Apache. I just looked at it and I had a few remarks and
>>>>> questions.
>>>>>
>>>>
>>>> Hi, thanks for joining us! Let's hope we haven't scared you away while
>>>> we try to get our first Apache release out and have the odd Blank Node
>>>> fight.. :)
>>>>
>>>
>>> Wait, there was a Blank Node fight and I wasn't part of it?
>>>
>>>  Just some background for those who don't already know me: I am part of
>>>>> the banana-rdf project [1]. The main approach there relies on defining
>>>>> RDF and its operations as a typeclass [2]. In that world, Jena and
>>>>> Sesame are just two instances of that typeclass (see for example [4]
>>>>> and [5]). So there is no wrapper involved. Still, commons-rdf is
>>>>> really a good step in the right direction as we could obsolete a lot
>>>>> of stuff.
>>>>>
>>>>
>>>> I can definitely see that banana-rdf is relevant to commons-rdf - and
>>>> also any requirements you might have to commons-rdf coming from Scala
>>>> is interesting.
>>>>
>>>
>>> That's cool if if we can accommodate commons-rdf to make it really
>>> useful from Scala-land. I think it is possible [1].
>>>
>>>  Right now, there is no support in commons-rdf for immutable
>>>>> operations. `Graph`s are mutable by default. Is there any plan to make
>>>>> an API for immutable graphs? Graphs in banana-rdf are immutable by
>>>>> default, and they are persistent in Plantain. We could always wrap an
>>>>> immutable graph in a structure with a `var`, but, well, there are
>>>>> better ways to do that.
>>>>>
>>>>
>>>> There has been suggestions along those lines. It is not a requirement
>>>> of Graph now to allow .add() etc. - but there is no method to ask if a
>>>> graph is mutable or not.
>>>>
>>>> In the user guide
>>>> http://commonsrdf.incubator.apache.org/userguide.html#Adding_triples
>>>> we therefore say:
>>>>
>>>>  Note: Some Graph implementations are immutable, in which case the
>>>>> below may throw an UnsupportedOperationException.
>>>>>
>>>>
>>>>
>>>> We could probably add this to the Javadoc of the mutability methods of
>>>> Graph with an explicit @throws.
>>>>
>>>> I raised this as https://issues.apache.org/jira/browse/COMMONSRDF-23
>>>>
>>>>
>>>>
>>>>
>>>> https://issues.apache.org/jira/browse/COMMONSRDF-7 discusses how we
>>>> should define immutability on the non-Graph objects.
>>>>
>>>>
>>>> In Clerezza's Commons RDF Core (which is somewhat aligned with Commons
>>>> RDF) there is an additional marker interface ImmutableGraph -- perhaps
>>>> something along those lines would work here?
>>>>
>>>
>>> If it doesn't exist in the type (i.e. statically), it's basically lost
>>> knowledge. @throws, being silent or explicit, is pretty much useless
>>> because now client code needs to check for this possibility. It would
>>> be a slightly better to make it an interface. And we could could have
>>> another interface for immutable graphs, where `add(Triple)` would
>>> return another graph.
>>>
>>> See how it can be done in [1].
>>>
>>>
>>>> https://github.com/apache/clerezza-rdf-core/blob/master/api/src/main/java/org/apache/clerezza/commons/rdf/ImmutableGraph.java
>>>>
>>>>
>>>>  `RDFTermFactory` is stateful just to accommodate
>>>>> `createBlankNode(String)`. It's stateless otherwise. This is really an
>>>>> issue for banana-rdf as everything is defined as pure function (the
>>>>> output only depends on the input).
>>>>>
>>>>
>>>> It does not need to be stateful.
>>>>
>>>> In simple we implemented this using a final UUID "salt" that is
>>>> created per instance of the factory. Do you consider this state?
>>>>
>>>>
>>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51
>>>>
>>>>
>>>> This is then used by
>>>>
>>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
>>>> as part of a hashing to generate the new uniqueReference().  Thus a
>>>> second call on the same factory with the same name will be hashed with
>>>> the same salt, and produce the same uniqueReference(), which makes the
>>>> second BlankNode equal to the first.
>>>>
>>>>
>>>> But you can achieve the contract by other non-stateful means, for
>>>> instance a random UUID that is static final (and hence no state at all
>>>> per factory instance), and you can create a uniqueReference() by
>>>> concatenating that UUID with the System.identityHashCode() of the
>>>> factory and concatenate the provided name.
>>>>
>>>
>>> This approach only gives you the illusion that there is no state, but
>>> there *is* one (e.g. with UUID, and the atomic counter). Because of
>>> its current contract, `createBlankNode(String)` cannot be
>>> referentially transparent, and this is an issue if one wants to take a
>>> functional approach.
>>>
>>>  Also you are not required to implement createBlankNode(String) - you
>>>> can simply throw UnsupportedOperationException and only support
>>>> createBlankNode().
>>>>
>>>
>>> What is the point of doing/allowing that? As a users or implementors,
>>> I want to know that I can rely on a method/function that is
>>> accessible. And that is also why I dislike the default implementation
>>> approach taken in the current draft.
>>>
>>>  This should probably be noted in the (yet so far just imagined)
>>>> Implementors Guide on the Commons RDF website.
>>>>
>>>> https://issues.apache.org/jira/browse/COMMONSRDF-24
>>>>
>>>>
>>>>  Is `createBlankNode(String)` really needed? The internal map for
>>>>> bnodes could be maintained _outside_ of the factory. Or at least, we
>>>>> could pass it as an argument instead: `createBlankNode(Map<String,
>>>>> BlankNode>, String)`.
>>>>>
>>>>
>>>> We did discuss if it was needed - there are arguments against it due
>>>> to blank nodes "existing only as themselves" and therefore a single
>>>> JVM object per BlankNode should be enough - however having the
>>>> flexibility for say a streaming RDF parser to create identitcal blank
>>>> node instances without keeping lots of object references felt like a
>>>> compelling argument to support this through the factory -  with for
>>>> example the hashing method above this means no state is required.
>>>>
>>>
>>> This is making *very structuring assumption*. Being referentially
>>> transparent makes none, and will always accommodate all cases. That
>>> being said, I understand the constraints.
>>>
>>> Note: [1] does not attempt to fix that issue.
>>>
>>>  # wrapped values
>>>>>
>>>>> There are a lot of unnecessary objects because of the class hierarchy.
>>>>> In banana-rdf, we can say that RDFTerm is a plain `String` while being
>>>>> 100% type-safe. That's already what's happening for the N3.js
>>>>> implementation. And in Plantain, Literals are just `java.lang.Object`
>>>>> [6] so that we can directly have String, Int, etc.
>>>>>
>>>>
>>>> Well, this is tricky in Java where you can't force a new interface
>>>> onto an existing type.
>>>>
>>>> How can you have a String as an RDFTerm? Because you use the
>>>> ntriplestring? This would require new "instanceOf"-like methods to
>>>> check what the string really is - and would require various classes
>>>> like LiteralInspector to dissect the string. This sounds to me like
>>>> building a different API..
>>>>
>>>
>>> The point is to abstract things away. By choosing to use actual
>>> interfaces, you are forcing everything to be under a class hierarchy
>>> for no good reason. I do not find the motivation for this approach in
>>> the documentation.
>>>
>>> Please see [1] for a discussion on that subject.
>>>
>>>  While I can see this can be a valid way to model RDF in a non-OO way,
>>>> I  think that would be difficult to align with Commons RDF as a
>>>> Java-focused API, where most Java programmers would expect type
>>>> hierarchies represented as regular Java class hierarchies.
>>>>
>>>
>>> I am not convinced. The RDF model is simple enough that another
>>> approach is possible [1].
>>>
>>>  That means that there is no way currently to provide a
>>>>> `RDFTermFactory` for Plantain. The only alternatives I see right now
>>>>> are:
>>>>>
>>>>
>>>> What is the challenge of returning wrappers? I think this is the
>>>> approach that Jena is also considering.
>>>>
>>>> Presumably if you are providing an RDFTermFactory then that is to
>>>> allow JVM code that expects any Commons RDF code to create Plantain
>>>> objects for RDF. They would expect to be able to do say:
>>>>
>>>> factory.createLiteral("Fred").getDatatype()
>>>>
>>>> which would not work on a returned String
>>>>
>>>
>>> You can (of course) do things like that in Scala, in a very typesafe
>>> way, i.e. it's not monkey patching. And this is happening in
>>> banana-rdf :-)
>>>
>>> That would be totally compatible with [1].
>>>
>>>  # getTriples vs iterate
>>>>>
>>>>> Not a big deal but I believe the naming could be improved. When I read
>>>>> getTriples, I expect to have all the triples in my hand, but this is
>>>>> not quite what Streams are about. On the other hand, when I read
>>>>> iterate, I kinda expect the opposite. Of course the types clarify
>>>>> everything but I believe it'd be easier to use getTriplesAsStream and
>>>>> getTriplesAsIterable.
>>>>>
>>>>
>>>> I might disagree - but I think this is a valuable to discuss.
>>>>
>>>
>>> As I said, not a big deal. Types are what really matters in the end.
>>>
>>>  I have taken the liberty to report this in your name as:
>>>> https://issues.apache.org/jira/browse/COMMONSRDF-22
>>>>
>>>> so we can discuss this further in the email thread that should have
>>>> triggered.
>>>>
>>>>
>>>>
>>>> Thanks for all your valuable suggestions - please do keep in touch and
>>>> let us know if you have a go at aligning banana-rdf with the upcoming
>>>> 0.1.0 release, further feedback on documentation, and anything else!
>>>>
>>>
>>> I believe that the the main issue is that the current approach is both
>>> for library users _and_ library authors, but there really are two
>>> different targets here. I agree that the class for the RDF model can
>>> provide a common framework for many Java people. But libraries relying
>>> on commons-rdf should not be tied to the classes.
>>>
>>> Please have a look at this gist to see what I mean and tell me what
>>> you think [1].
>>>
>>> Alexandre
>>>
>>> [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb
>>>
>>>
>>>> --
>>>> Stian Soiland-Reyes
>>>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>>>> http://orcid.org/0000-0001-9842-9718
>>>>
>>>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Re: General questions

Reply via email to