Re: General questions

Alexandre Bertails Tue, 12 May 2015 09:47:04 -0700

Stian,

It sounds stupid but I do not understand where the code actually lives.


I have tried

```
git clone https://git-wip-us.apache.org/repos/asf/commons-rdf.git
```

and

```
git clone git://git.apache.org/commons-rdf.git
```

but both tell me that I "appear to have cloned an empty repository."
The github repo is empty as well.

Can somebody please give me the right URI? Sorry if I miss that in the
documentation, but I did look there and couldn't find the answer :-/

Alexandre


On Tue, May 12, 2015 at 8:41 AM, Alexandre Bertails
<[email protected]> wrote:
> Hi Stian,
>
> On Tue, May 12, 2015 at 7:35 AM, Stian Soiland-Reyes <[email protected]> wrote:
>> On 12 May 2015 at 06:20, Alexandre Bertails <[email protected]> wrote:
>>
>>> I actually didn't understand that we were discussing a
>>> `createBlankNode(UUID)`. I think we just need to be able to create a
>>> fresh blank node.
>>
>> That is what createBlankNode() does.
>>
>> Is your proposal to simply remove createBlankNode(String)?
>
> As it is today, yes. Because its contract implies some kind of shared state.
>
> But we have identified a use-case where the blank node can remember in
> which context it was generated e.g. the blank node label at parsing
> time.
>
>>> Requiring the caller to provide an explicit UUID
>>> means that the freshness is happening *outside* of the factory, so I
>>> don't see the point.
>>
>> Well, you wanted to pass in the uniqueness..? You can pass it as a
>> String (as of today), or, loosely suggested, by restricting this to a
>> UUID (which would require clients to think about this very common
>> mapping/hashing).
>
> No, the uniqueness must happen in `createBlankNode()`. That's how you
> can enforce the invariant.
>
>>> Also, it's forcing the strategy (UUID), which
>>> might not be the best one for everybody, e.g. UUID is known to be
>>> slow, at least for some notion of slow, and that could become a
>>
>> There are several variations of UUID, you are free to use a
>> timestamp one that is rather fast to make, SHA-1 is not known to be slow
>> either, so version 5 hashes are also fast.
>
> commons-rdf should leave that choice open.
>
>> But we agreed that UUID only might be a bit strict for some implementations,
>> which meant that uniqueReference() can return any unique string.. so if it
>> considered
>>
>>   app=97975c0b-62c1-42c9-b2a9-e87948e4a46e ip=84.92.48.26 uid=1000
>> pid=292 name=fred
>>
>> to be a unique string (with hard-coded 97975c0b-62c1-42c9-b2a9-e87948e4a46e
>> in case someone else comes up with a similar scheme),
>> and didn't mind leaking all that vulnerability data, then that would be a
>> compliant uniqueReference().
>>
>>
>>
>>> I am not arguing for stateless vs stateful. I am just pointing at some
>>> design issues which do not allow it. Currently, there is just no way
>>> for an immutable implementation to be used with such a factory.
>>
>> I am not sure what is the extent of "immutable" here. I'll assume it
>> just means that all fields are final, not
>> that the object is not allowed to have any field at all.
>
> Being final just means that the reference won't be updated, but its
> state can still be updated. So to be immutable, you also need the
> final references to be immutable themselves.
>
>> You are free to
>> create RDFTermFactory as you please, so you can simply do it like this:
>>
>> public class ImmutableRDFTermFactory implements RDFTermFactory {
>>     private final UUID salt;
>>     public ImmutableRDFTermFactory(UUID salt) {
>>         this.salt = salt;
>>     }
>>     public BlankNode createBlankNode() {
>>       return new BlankNodeImpl(salt);
>>     }
>>     public BlankNode createBlankNode(String name) {
>>       return new BlankNodeImpl(salt, name);
>>     }
>>     / ..
>> }
>>
>> public class BlankNodeImpl implements BlankNode {
>>
>>   private static void unique(UUID salt) {
>>      Instant now = Clock.systemUTC().instant();
>>      return salt.toString()  + System.identityHashCode(this) +
>> now.getEpochSecond() + now.getNano() + Thread.currentThread().getId();
>>   }
>>
>>   private final String uniqueReference;
>>   public BlankNodeImpl(UUID salt, String name) {
>>     uniqueReference = salt.toString() + name;
>>   }
>>   public BlankNodeImpl(UUID salt) {
>>     uniqueReference = salt.toString()  + System.identityHashCode(this)
>> + new Date().;
>>   }
>> }
>
> This is not immutable because of the shared state.
>
>> Here there is no hidden mutability in AtomicLong or within
>> java.util.UUID's SecureRandom implementation's internal state. I guess
>> you would not be happy with those either?
>>
>> The clock is obviously mutable - but as a device rather than a memory state.
>
> There is no "but" in the immutable world :-)
>
>>> Having `add` returning a `Graph` does not mean that `Graph` is
>>> immutable. It just means that it *enables* `Graph` to be immutable.
>>
>> There is nothing stopping an immutable Graph from having an additional
>> method that does this.
>
> Now I am the one asking for some code, because I don't see how that'd work :-p
>
> As I said in a previous, you can wrap an immutable Graph in a new
> object with a mutable reference to that graph, but, well, please let's
> avoid having to do that...
>
>> For some methods, like builders, returning the mutated state is good 
>> practice.
>
> When using persistent datastructures, a builder is not an option.
>
> There are areas where you do not want to go back to the mutable
> version. It happens everywhere in banana-rdf e.g. the RDF DSL, the
> RDF/class mapper, etc. Just because we need to compose graphs without
> risking to modify an existing one.
>
>> It has been suggested earlier to return bool on add() to be compatible
>> with Collection, but we were not all too happy with that as it might
>> be difficult/expensive to know if the graph was actually mutated or
>> not (e.g. you insert the same triple twice, but the store doesn't
>> bother checking if the triple existed).
>
> Returning `bool` has very little value from my perspective.
>
>>
>> See
>> https://issues.apache.org/jira/browse/COMMONSRDF-17
>> https://github.com/commons-rdf/commons-rdf/issues/27
>> https://github.com/commons-rdf/commons-rdf/issues/46
>>
>>
>> So your suggestion is for the mutability methods to return the mutated
>> object (which may or may not be the original instance). I think this
>> could be an interesting take for discussions - could you raise this as
>> a separate Jira issue?
>
> Yes, that'd be the way to go.
>
> But I would prefer to see how much interest in the general approach
> there is before opening too many issues.
>
>>
>>
>>> Well, Scala is just a language. Immutability and referential
>>> transparency, are just principles, but they are becoming more and more
>>> important in many areas (Spark, concurrency, etc.).
>>
>> Agreed, also for distributed areas like Hadoop.
>
> There are *many* areas where accommodating immutable graphs has become
> important.
>
>>> There is no shortcut at all. The RDF model only resolves around some
>>> types (Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode,
>>> Literal) which can be left abstract, as opposed to being concrete when
>>> using Java's interfaces. (it's "concrete" in the sense it's using
>>> nominal subtyping)
>>
>> Well, I still don't see how a java.util.String will work with Java
>> code that expects to be able to call .getIRIString(). Would
>> Scala generate proxies on the fly?  Or would it need to call
>> .getIRIString() "elsewhere"?
>
> It's like monkey patching, just in a controlled and type safe way:
>
> ```
> val rdf: RDF = ???
>
> implicit class IRIWrapper(val iri: IRI) extends AnyVal {
>   def getIRIString(): String = rdf.getIRIString(iri)
> }
>
> val iri: IRI = rdf.createIRI("http://example.com";)
> assert(rdf.getIRIString(iri) == iri.getIRIString())
> ```
>
> Scala would find that there is an implicit conversion from IRI to
> something with a getIRIString method, and would do the `new
> IRIWrapper`. But because this is also a value class (`AnyVal`) then no
> object would actually be allocated. It's basically free.
>
>>
>>
>>> If you look at what I did, you have a *direct* translation of the
>>> existing interfaces+methods+factory into simple functions.
>>
>> Yes, but done in Scala. Can I see a suggestion to the changes of the
>> current CommonsRDF Java interfaces - in Java?
>
> No the gist is in Java and uses the same function names.
>
>>
>>
>>> * the Java interfaces becomes abstract types
>>
>> Java interfaces are abstract types.
>
> Java interfaces provide some abstraction (subtype polymorphism). Types
> are compile-time information. At runtime, you see a reified version of
> the type, as an interface or as a class (and module type erasure).
> That is why Java interfaces are not really abstract types.
>
>> Do you mean generics?
>
> Yes.
>
>>  Generics of which class/interface?
>
> Of the RDF interface in the gist [1].
>
> [1] https://gist.github.com/betehess/8983dbff2c3e89f9dadb#file-rdf-java-L10
>
>> Not all Commons RDF clients are expected to interface via
>> RDFTermFactory. In fact many use-cases don't need it at all.
>>
>>
>>> * the methods on those interfaces become functions on the abstract types
>>> * the methods on the interfaces in the factory becomes simple
>>> functions on the abstract types
>>> * operating on a node happens with a visitor (as in visitor pattern)
>>> implemented as the `visit` function, taking 3 functions for the 3
>>> possible cases (I believe the current API asks for checking the class
>>> at runtime...)
>>
>> This is too much at an abstract (!) level for me to visualize as we're
>> clashing programming languages here.. could you detail how this would
>> look in a set of *.java files? Feel free to raise it as a pull request
>> or similar, even if it's very draft-like. :)
>
> I can transform my gist into a real project. I will need a couple of
> days to find the time.
>
>>
>>
>>> Now, let's say I am implementing a Turtle parser. The only thing I
>>> care about is how I can [use case 1] create/inject elements into some
>>> existing RDF model. If I am writing a Turtle serializer, I only care
>>> about how to [use case 2] traverse that type hierarchy. In none of
>>> those cases did I care about having the types defined in the
>>> class/interface hierarchy and I want anybody to use their own RDF
>>> model.
>>
>> Yes. And with the current take of Commons RDF, the Turtle parser is free
>> to return its own instances of RDFTerm interfaces, which any Commons RDF
>> consuming client will be able to use as-is, e.g. pass to their own
>> Graph implementation.
>
> And here is what people will end up doing:
>
> ```
> Graph graph = JenaTurtleParser.parse(input);
> com.hp.hpl.jena.graph.Graph jenaGraph = (com.hp.hpl.jena.graph.Graph)graph;
> ```
>
> Many will not want to see the common interface but the actual subtype.
>
>>
>>
>>
>>> class TurtleParser<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI,
>>> BlankNode, Literal> {
>>>   RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, Literal> rdf
>>>   Graph parse(String input) { /* can call rdf.createLiteral("foo"), or
>>> anything in rdf.* */ }
>>> }
>>
>> I think the <brackets> speak for themselves here :-(
>>
>>
>>
>>> "Small" remark: I still don't think that `createBlankNode(String)`
>>> belongs to the RDF model. I would really like to see a use case that
>>> shows why it has to be present.
>>
>> This is a valid point of view which I think you should raise
>> as a new Jira issue. We did argue that it is not part of the
>> RDF model, but it is still a practically very useful feature,
>
> "useful feature" --> this is where I would like to see a motivating
> use case. Then we can discus how useful a feature it is, or how much
> of a problem it can be.
>
>> however it has generated many contention points in the past
>> as it touches on state and uniqueness.
>>
>>
>> See also this discussion about the need (or not) for
>> exposing .uniqueReference()
>
> I am all in favor or `uniqueReference`. That is how the invariants on
> the blank node can be achieved.
>
>>
>> https://issues.apache.org/jira/browse/COMMONSRDF-13
>>
>>
>>
>>> Finally, I will admit that writing all those types parameters can be a
>>> bit cumbersome, even if it happens only in a very few places (as a
>>> user: only once when you build what you need e.g. a Turtle parser).
>>> But please let's not sacrifice correctness and functionality to (a
>>> little) convenience...
>>
>> Well, if those would be exposed to any client of the Commons RDF API I
>> fear we would see very little uptake..
>
> How so?
>
>> If they are hidden inside some upper/inner interface that is not
>> exposed otherwise, it is not so bad.
>
> Yes, you can always do that.
>
> Alexandre
>
>>
>>
>> --
>> Stian Soiland-Reyes
>> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> http://orcid.org/0000-0001-9842-9718

Re: General questions

Reply via email to