Re: General questions

Andy Seaborne Thu, 14 May 2015 03:38:35 -0700

Alexandre's proposal would the project in a different direction.

The goals for me have been switching underlying systems (being able toproduce portable algorithms that can be applied without recompiling theworld (ServiceLoader to bind to implementation) and of interoperationacross systems.

Being able to switch implementation choices by choosing different typesis interesting but different. It would be nice to see both existingthough combining into one "thing" seems to overload the focus.

Do we want to "host" the generics approach as well (whether used for thesystem abstraction work or to go along side)?

The other difference is a theory-practice one. The current work is notreworking the general style of Jena, Sesame, common progamming ways ofdoing things. I'd like to see the generics approach validated byexternal usage, not for its technical design, but addressing whether itcreates sufficient demand and sufficient acceptance.


        Andy

On 13/05/15 07:32, Alexandre Bertails wrote:

Sergio,

The approach is different. A "patch" against the current codebase
would remove most of the interfaces.

I suggest that you try to understand what's going on in the code,
after you read the other messages in that thread.

Then if there is interest, I can work on a real patch.

Alexandre

On Tue, May 12, 2015 at 11:25 PM, Sergio Fernández <[email protected]> wrote:

I'd say if you'd be much more valuable to see a patch about your proposal
that a quick hack from scratch.
You can fork our github mirror:
https://github.com/apache/incubator-commonsrdf

On Wed, May 13, 2015 at 8:01 AM, Alexandre Bertails <[email protected]>
wrote:

On Tue, May 12, 2015 at 10:21 PM, Sergio Fernández <[email protected]>
wrote:

Alexandre,

git clone
https://[email protected]/repos/asf/incubator-commonsrdf.git
commonsrdf

The incubator prefix in the name is to keep clear we're still not fully
endorsed by the ASF. I know it's a bit inconvenient, specially in later
phases when we'd get rid of that, but is part of the incubator process.


Thanks!

I have hacked something quick-and-dirty and made it available at [1].

Quick overview of the sub-packages:
* `api`: just the RDF interface, and the interfaces from commons-rdf
are moved under `concrete`
* `concrete`: shows how to implement RDF with the interfaces approach
* `simple`: a complete example adapted from commons-rdf
* `classless`: a (almost) complete example which does not rely on
shared interfaces
* `turtle`: a example of how to rely on the RDF interface

Feel free to ask questions.

Alexandre

[1] https://github.com/betehess/free-rdf




On Tue, May 12, 2015 at 6:45 PM, Alexandre Bertails <

[email protected]>

wrote:

Stian,

It sounds stupid but I do not understand where the code actually lives.

I have tried

```
git clone https://git-wip-us.apache.org/repos/asf/commons-rdf.git
```

and

```
git clone git://git.apache.org/commons-rdf.git
```

but both tell me that I "appear to have cloned an empty repository."
The github repo is empty as well.

Can somebody please give me the right URI? Sorry if I miss that in the
documentation, but I did look there and couldn't find the answer :-/

Alexandre


On Tue, May 12, 2015 at 8:41 AM, Alexandre Bertails
<[email protected]> wrote:

Hi Stian,

On Tue, May 12, 2015 at 7:35 AM, Stian Soiland-Reyes <

[email protected]>

wrote:

On 12 May 2015 at 06:20, Alexandre Bertails <[email protected]>

wrote:

I actually didn't understand that we were discussing a
`createBlankNode(UUID)`. I think we just need to be able to create a
fresh blank node.


That is what createBlankNode() does.

Is your proposal to simply remove createBlankNode(String)?


As it is today, yes. Because its contract implies some kind of shared

state.


But we have identified a use-case where the blank node can remember in
which context it was generated e.g. the blank node label at parsing
time.

Requiring the caller to provide an explicit UUID
means that the freshness is happening *outside* of the factory, so I
don't see the point.


Well, you wanted to pass in the uniqueness..? You can pass it as a
String (as of today), or, loosely suggested, by restricting this to a
UUID (which would require clients to think about this very common
mapping/hashing).


No, the uniqueness must happen in `createBlankNode()`. That's how you
can enforce the invariant.

Also, it's forcing the strategy (UUID), which
might not be the best one for everybody, e.g. UUID is known to be
slow, at least for some notion of slow, and that could become a


There are several variations of UUID, you are free to use a
timestamp one that is rather fast to make, SHA-1 is not known to be

slow

either, so version 5 hashes are also fast.


commons-rdf should leave that choice open.

But we agreed that UUID only might be a bit strict for some

implementations,

which meant that uniqueReference() can return any unique string.. so

if

it

considered

   app=97975c0b-62c1-42c9-b2a9-e87948e4a46e ip=84.92.48.26 uid=1000
pid=292 name=fred

to be a unique string (with hard-coded

97975c0b-62c1-42c9-b2a9-e87948e4a46e

in case someone else comes up with a similar scheme),
and didn't mind leaking all that vulnerability data, then that would

be

compliant uniqueReference().

I am not arguing for stateless vs stateful. I am just pointing at

some

design issues which do not allow it. Currently, there is just no way
for an immutable implementation to be used with such a factory.


I am not sure what is the extent of "immutable" here. I'll assume it
just means that all fields are final, not
that the object is not allowed to have any field at all.


Being final just means that the reference won't be updated, but its
state can still be updated. So to be immutable, you also need the
final references to be immutable themselves.

You are free to
create RDFTermFactory as you please, so you can simply do it like

this:


public class ImmutableRDFTermFactory implements RDFTermFactory {
     private final UUID salt;
     public ImmutableRDFTermFactory(UUID salt) {
         this.salt = salt;
     }
     public BlankNode createBlankNode() {
       return new BlankNodeImpl(salt);
     }
     public BlankNode createBlankNode(String name) {
       return new BlankNodeImpl(salt, name);
     }
     / ..
}

public class BlankNodeImpl implements BlankNode {

   private static void unique(UUID salt) {
      Instant now = Clock.systemUTC().instant();
      return salt.toString()  + System.identityHashCode(this) +
now.getEpochSecond() + now.getNano() +

Thread.currentThread().getId();

   }

   private final String uniqueReference;
   public BlankNodeImpl(UUID salt, String name) {
     uniqueReference = salt.toString() + name;
   }
   public BlankNodeImpl(UUID salt) {
     uniqueReference = salt.toString()  +

System.identityHashCode(this)

+ new Date().;
   }
}


This is not immutable because of the shared state.

Here there is no hidden mutability in AtomicLong or within
java.util.UUID's SecureRandom implementation's internal state. I

guess

you would not be happy with those either?

The clock is obviously mutable - but as a device rather than a memory

state.


There is no "but" in the immutable world :-)

Having `add` returning a `Graph` does not mean that `Graph` is
immutable. It just means that it *enables* `Graph` to be immutable.


There is nothing stopping an immutable Graph from having an

additional

method that does this.


Now I am the one asking for some code, because I don't see how that'd

work :-p


As I said in a previous, you can wrap an immutable Graph in a new
object with a mutable reference to that graph, but, well, please let's
avoid having to do that...

For some methods, like builders, returning the mutated state is good

practice.


When using persistent datastructures, a builder is not an option.

There are areas where you do not want to go back to the mutable
version. It happens everywhere in banana-rdf e.g. the RDF DSL, the
RDF/class mapper, etc. Just because we need to compose graphs without
risking to modify an existing one.

It has been suggested earlier to return bool on add() to be

compatible

with Collection, but we were not all too happy with that as it might
be difficult/expensive to know if the graph was actually mutated or
not (e.g. you insert the same triple twice, but the store doesn't
bother checking if the triple existed).


Returning `bool` has very little value from my perspective.


See
https://issues.apache.org/jira/browse/COMMONSRDF-17
https://github.com/commons-rdf/commons-rdf/issues/27
https://github.com/commons-rdf/commons-rdf/issues/46


So your suggestion is for the mutability methods to return the

mutated

object (which may or may not be the original instance). I think this
could be an interesting take for discussions - could you raise this

as

a separate Jira issue?


Yes, that'd be the way to go.

But I would prefer to see how much interest in the general approach
there is before opening too many issues.

Well, Scala is just a language. Immutability and referential
transparency, are just principles, but they are becoming more and

more

important in many areas (Spark, concurrency, etc.).


Agreed, also for distributed areas like Hadoop.


There are *many* areas where accommodating immutable graphs has become
important.

There is no shortcut at all. The RDF model only resolves around some
types (Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode,
Literal) which can be left abstract, as opposed to being concrete

when

using Java's interfaces. (it's "concrete" in the sense it's using
nominal subtyping)


Well, I still don't see how a java.util.String will work with Java
code that expects to be able to call .getIRIString(). Would
Scala generate proxies on the fly?  Or would it need to call
.getIRIString() "elsewhere"?


It's like monkey patching, just in a controlled and type safe way:

```
val rdf: RDF = ???

implicit class IRIWrapper(val iri: IRI) extends AnyVal {
   def getIRIString(): String = rdf.getIRIString(iri)
}

val iri: IRI = rdf.createIRI("http://example.com";)
assert(rdf.getIRIString(iri) == iri.getIRIString())
```

Scala would find that there is an implicit conversion from IRI to
something with a getIRIString method, and would do the `new
IRIWrapper`. But because this is also a value class (`AnyVal`) then no
object would actually be allocated. It's basically free.

If you look at what I did, you have a *direct* translation of the
existing interfaces+methods+factory into simple functions.


Yes, but done in Scala. Can I see a suggestion to the changes of the
current CommonsRDF Java interfaces - in Java?


No the gist is in Java and uses the same function names.

* the Java interfaces becomes abstract types


Java interfaces are abstract types.


Java interfaces provide some abstraction (subtype polymorphism). Types
are compile-time information. At runtime, you see a reified version of
the type, as an interface or as a class (and module type erasure).
That is why Java interfaces are not really abstract types.

Do you mean generics?


Yes.

  Generics of which class/interface?


Of the RDF interface in the gist [1].

[1]

https://gist.github.com/betehess/8983dbff2c3e89f9dadb#file-rdf-java-L10

Not all Commons RDF clients are expected to interface via
RDFTermFactory. In fact many use-cases don't need it at all.

* the methods on those interfaces become functions on the abstract

types

* the methods on the interfaces in the factory becomes simple
functions on the abstract types
* operating on a node happens with a visitor (as in visitor pattern)
implemented as the `visit` function, taking 3 functions for the 3
possible cases (I believe the current API asks for checking the

class

at runtime...)


This is too much at an abstract (!) level for me to visualize as

we're

clashing programming languages here.. could you detail how this would
look in a set of *.java files? Feel free to raise it as a pull

request

or similar, even if it's very draft-like. :)


I can transform my gist into a real project. I will need a couple of
days to find the time.

Now, let's say I am implementing a Turtle parser. The only thing I
care about is how I can [use case 1] create/inject elements into

some

existing RDF model. If I am writing a Turtle serializer, I only care
about how to [use case 2] traverse that type hierarchy. In none of
those cases did I care about having the types defined in the
class/interface hierarchy and I want anybody to use their own RDF
model.


Yes. And with the current take of Commons RDF, the Turtle parser is

free

to return its own instances of RDFTerm interfaces, which any Commons

RDF

consuming client will be able to use as-is, e.g. pass to their own
Graph implementation.


And here is what people will end up doing:

```
Graph graph = JenaTurtleParser.parse(input);
com.hp.hpl.jena.graph.Graph jenaGraph =

(com.hp.hpl.jena.graph.Graph)graph;

```

Many will not want to see the common interface but the actual subtype.

class TurtleParser<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI,
BlankNode, Literal> {
   RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode,

Literal>

rdf

   Graph parse(String input) { /* can call rdf.createLiteral("foo"),

or

anything in rdf.* */ }
}


I think the <brackets> speak for themselves here :-(

"Small" remark: I still don't think that `createBlankNode(String)`
belongs to the RDF model. I would really like to see a use case that
shows why it has to be present.


This is a valid point of view which I think you should raise
as a new Jira issue. We did argue that it is not part of the
RDF model, but it is still a practically very useful feature,


"useful feature" --> this is where I would like to see a motivating
use case. Then we can discus how useful a feature it is, or how much
of a problem it can be.

however it has generated many contention points in the past
as it touches on state and uniqueness.


See also this discussion about the need (or not) for
exposing .uniqueReference()


I am all in favor or `uniqueReference`. That is how the invariants on
the blank node can be achieved.


https://issues.apache.org/jira/browse/COMMONSRDF-13

Finally, I will admit that writing all those types parameters can

be a

bit cumbersome, even if it happens only in a very few places (as a
user: only once when you build what you need e.g. a Turtle parser).
But please let's not sacrifice correctness and functionality to (a
little) convenience...


Well, if those would be exposed to any client of the Commons RDF API

fear we would see very little uptake..


How so?

If they are hidden inside some upper/inner interface that is not
exposed otherwise, it is not so bad.


Yes, you can always do that.

Alexandre



--
Stian Soiland-Reyes
Apache Taverna (incubating), Apache Commons RDF (incubating)
http://orcid.org/0000-0001-9842-9718




--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: [email protected]
w: http://redlink.co




--
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: [email protected]
w: http://redlink.co

Re: General questions

Reply via email to