Re: General questions

Alexandre Bertails Tue, 12 May 2015 23:34:15 -0700

Sergio,

The approach is different. A "patch" against the current codebase
would remove most of the interfaces.


I suggest that you try to understand what's going on in the code,
after you read the other messages in that thread.

Then if there is interest, I can work on a real patch.

Alexandre

On Tue, May 12, 2015 at 11:25 PM, Sergio Fernández <[email protected]> wrote:
> I'd say if you'd be much more valuable to see a patch about your proposal
> that a quick hack from scratch.
> You can fork our github mirror:
> https://github.com/apache/incubator-commonsrdf
>
> On Wed, May 13, 2015 at 8:01 AM, Alexandre Bertails <[email protected]>
> wrote:
>
>> On Tue, May 12, 2015 at 10:21 PM, Sergio Fernández <[email protected]>
>> wrote:
>> > Alexandre,
>> >
>> > git clone
>> > https://[email protected]/repos/asf/incubator-commonsrdf.git
>> > commonsrdf
>> >
>> > The incubator prefix in the name is to keep clear we're still not fully
>> > endorsed by the ASF. I know it's a bit inconvenient, specially in later
>> > phases when we'd get rid of that, but is part of the incubator process.
>>
>> Thanks!
>>
>> I have hacked something quick-and-dirty and made it available at [1].
>>
>> Quick overview of the sub-packages:
>> * `api`: just the RDF interface, and the interfaces from commons-rdf
>> are moved under `concrete`
>> * `concrete`: shows how to implement RDF with the interfaces approach
>> * `simple`: a complete example adapted from commons-rdf
>> * `classless`: a (almost) complete example which does not rely on
>> shared interfaces
>> * `turtle`: a example of how to rely on the RDF interface
>>
>> Feel free to ask questions.
>>
>> Alexandre
>>
>> [1] https://github.com/betehess/free-rdf
>>
>> >
>> >
>> >
>> > On Tue, May 12, 2015 at 6:45 PM, Alexandre Bertails <
>> [email protected]>
>> > wrote:
>> >
>> >> Stian,
>> >>
>> >> It sounds stupid but I do not understand where the code actually lives.
>> >>
>> >> I have tried
>> >>
>> >> ```
>> >> git clone https://git-wip-us.apache.org/repos/asf/commons-rdf.git
>> >> ```
>> >>
>> >> and
>> >>
>> >> ```
>> >> git clone git://git.apache.org/commons-rdf.git
>> >> ```
>> >>
>> >> but both tell me that I "appear to have cloned an empty repository."
>> >> The github repo is empty as well.
>> >>
>> >> Can somebody please give me the right URI? Sorry if I miss that in the
>> >> documentation, but I did look there and couldn't find the answer :-/
>> >>
>> >> Alexandre
>> >>
>> >>
>> >> On Tue, May 12, 2015 at 8:41 AM, Alexandre Bertails
>> >> <[email protected]> wrote:
>> >> > Hi Stian,
>> >> >
>> >> > On Tue, May 12, 2015 at 7:35 AM, Stian Soiland-Reyes <
>> [email protected]>
>> >> wrote:
>> >> >> On 12 May 2015 at 06:20, Alexandre Bertails <[email protected]>
>> >> wrote:
>> >> >>
>> >> >>> I actually didn't understand that we were discussing a
>> >> >>> `createBlankNode(UUID)`. I think we just need to be able to create a
>> >> >>> fresh blank node.
>> >> >>
>> >> >> That is what createBlankNode() does.
>> >> >>
>> >> >> Is your proposal to simply remove createBlankNode(String)?
>> >> >
>> >> > As it is today, yes. Because its contract implies some kind of shared
>> >> state.
>> >> >
>> >> > But we have identified a use-case where the blank node can remember in
>> >> > which context it was generated e.g. the blank node label at parsing
>> >> > time.
>> >> >
>> >> >>> Requiring the caller to provide an explicit UUID
>> >> >>> means that the freshness is happening *outside* of the factory, so I
>> >> >>> don't see the point.
>> >> >>
>> >> >> Well, you wanted to pass in the uniqueness..? You can pass it as a
>> >> >> String (as of today), or, loosely suggested, by restricting this to a
>> >> >> UUID (which would require clients to think about this very common
>> >> >> mapping/hashing).
>> >> >
>> >> > No, the uniqueness must happen in `createBlankNode()`. That's how you
>> >> > can enforce the invariant.
>> >> >
>> >> >>> Also, it's forcing the strategy (UUID), which
>> >> >>> might not be the best one for everybody, e.g. UUID is known to be
>> >> >>> slow, at least for some notion of slow, and that could become a
>> >> >>
>> >> >> There are several variations of UUID, you are free to use a
>> >> >> timestamp one that is rather fast to make, SHA-1 is not known to be
>> slow
>> >> >> either, so version 5 hashes are also fast.
>> >> >
>> >> > commons-rdf should leave that choice open.
>> >> >
>> >> >> But we agreed that UUID only might be a bit strict for some
>> >> implementations,
>> >> >> which meant that uniqueReference() can return any unique string.. so
>> if
>> >> it
>> >> >> considered
>> >> >>
>> >> >>   app=97975c0b-62c1-42c9-b2a9-e87948e4a46e ip=84.92.48.26 uid=1000
>> >> >> pid=292 name=fred
>> >> >>
>> >> >> to be a unique string (with hard-coded
>> >> 97975c0b-62c1-42c9-b2a9-e87948e4a46e
>> >> >> in case someone else comes up with a similar scheme),
>> >> >> and didn't mind leaking all that vulnerability data, then that would
>> be
>> >> a
>> >> >> compliant uniqueReference().
>> >> >>
>> >> >>
>> >> >>
>> >> >>> I am not arguing for stateless vs stateful. I am just pointing at
>> some
>> >> >>> design issues which do not allow it. Currently, there is just no way
>> >> >>> for an immutable implementation to be used with such a factory.
>> >> >>
>> >> >> I am not sure what is the extent of "immutable" here. I'll assume it
>> >> >> just means that all fields are final, not
>> >> >> that the object is not allowed to have any field at all.
>> >> >
>> >> > Being final just means that the reference won't be updated, but its
>> >> > state can still be updated. So to be immutable, you also need the
>> >> > final references to be immutable themselves.
>> >> >
>> >> >> You are free to
>> >> >> create RDFTermFactory as you please, so you can simply do it like
>> this:
>> >> >>
>> >> >> public class ImmutableRDFTermFactory implements RDFTermFactory {
>> >> >>     private final UUID salt;
>> >> >>     public ImmutableRDFTermFactory(UUID salt) {
>> >> >>         this.salt = salt;
>> >> >>     }
>> >> >>     public BlankNode createBlankNode() {
>> >> >>       return new BlankNodeImpl(salt);
>> >> >>     }
>> >> >>     public BlankNode createBlankNode(String name) {
>> >> >>       return new BlankNodeImpl(salt, name);
>> >> >>     }
>> >> >>     / ..
>> >> >> }
>> >> >>
>> >> >> public class BlankNodeImpl implements BlankNode {
>> >> >>
>> >> >>   private static void unique(UUID salt) {
>> >> >>      Instant now = Clock.systemUTC().instant();
>> >> >>      return salt.toString()  + System.identityHashCode(this) +
>> >> >> now.getEpochSecond() + now.getNano() +
>> Thread.currentThread().getId();
>> >> >>   }
>> >> >>
>> >> >>   private final String uniqueReference;
>> >> >>   public BlankNodeImpl(UUID salt, String name) {
>> >> >>     uniqueReference = salt.toString() + name;
>> >> >>   }
>> >> >>   public BlankNodeImpl(UUID salt) {
>> >> >>     uniqueReference = salt.toString()  +
>> System.identityHashCode(this)
>> >> >> + new Date().;
>> >> >>   }
>> >> >> }
>> >> >
>> >> > This is not immutable because of the shared state.
>> >> >
>> >> >> Here there is no hidden mutability in AtomicLong or within
>> >> >> java.util.UUID's SecureRandom implementation's internal state. I
>> guess
>> >> >> you would not be happy with those either?
>> >> >>
>> >> >> The clock is obviously mutable - but as a device rather than a memory
>> >> state.
>> >> >
>> >> > There is no "but" in the immutable world :-)
>> >> >
>> >> >>> Having `add` returning a `Graph` does not mean that `Graph` is
>> >> >>> immutable. It just means that it *enables* `Graph` to be immutable.
>> >> >>
>> >> >> There is nothing stopping an immutable Graph from having an
>> additional
>> >> >> method that does this.
>> >> >
>> >> > Now I am the one asking for some code, because I don't see how that'd
>> >> work :-p
>> >> >
>> >> > As I said in a previous, you can wrap an immutable Graph in a new
>> >> > object with a mutable reference to that graph, but, well, please let's
>> >> > avoid having to do that...
>> >> >
>> >> >> For some methods, like builders, returning the mutated state is good
>> >> practice.
>> >> >
>> >> > When using persistent datastructures, a builder is not an option.
>> >> >
>> >> > There are areas where you do not want to go back to the mutable
>> >> > version. It happens everywhere in banana-rdf e.g. the RDF DSL, the
>> >> > RDF/class mapper, etc. Just because we need to compose graphs without
>> >> > risking to modify an existing one.
>> >> >
>> >> >> It has been suggested earlier to return bool on add() to be
>> compatible
>> >> >> with Collection, but we were not all too happy with that as it might
>> >> >> be difficult/expensive to know if the graph was actually mutated or
>> >> >> not (e.g. you insert the same triple twice, but the store doesn't
>> >> >> bother checking if the triple existed).
>> >> >
>> >> > Returning `bool` has very little value from my perspective.
>> >> >
>> >> >>
>> >> >> See
>> >> >> https://issues.apache.org/jira/browse/COMMONSRDF-17
>> >> >> https://github.com/commons-rdf/commons-rdf/issues/27
>> >> >> https://github.com/commons-rdf/commons-rdf/issues/46
>> >> >>
>> >> >>
>> >> >> So your suggestion is for the mutability methods to return the
>> mutated
>> >> >> object (which may or may not be the original instance). I think this
>> >> >> could be an interesting take for discussions - could you raise this
>> as
>> >> >> a separate Jira issue?
>> >> >
>> >> > Yes, that'd be the way to go.
>> >> >
>> >> > But I would prefer to see how much interest in the general approach
>> >> > there is before opening too many issues.
>> >> >
>> >> >>
>> >> >>
>> >> >>> Well, Scala is just a language. Immutability and referential
>> >> >>> transparency, are just principles, but they are becoming more and
>> more
>> >> >>> important in many areas (Spark, concurrency, etc.).
>> >> >>
>> >> >> Agreed, also for distributed areas like Hadoop.
>> >> >
>> >> > There are *many* areas where accommodating immutable graphs has become
>> >> > important.
>> >> >
>> >> >>> There is no shortcut at all. The RDF model only resolves around some
>> >> >>> types (Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode,
>> >> >>> Literal) which can be left abstract, as opposed to being concrete
>> when
>> >> >>> using Java's interfaces. (it's "concrete" in the sense it's using
>> >> >>> nominal subtyping)
>> >> >>
>> >> >> Well, I still don't see how a java.util.String will work with Java
>> >> >> code that expects to be able to call .getIRIString(). Would
>> >> >> Scala generate proxies on the fly?  Or would it need to call
>> >> >> .getIRIString() "elsewhere"?
>> >> >
>> >> > It's like monkey patching, just in a controlled and type safe way:
>> >> >
>> >> > ```
>> >> > val rdf: RDF = ???
>> >> >
>> >> > implicit class IRIWrapper(val iri: IRI) extends AnyVal {
>> >> >   def getIRIString(): String = rdf.getIRIString(iri)
>> >> > }
>> >> >
>> >> > val iri: IRI = rdf.createIRI("http://example.com";)
>> >> > assert(rdf.getIRIString(iri) == iri.getIRIString())
>> >> > ```
>> >> >
>> >> > Scala would find that there is an implicit conversion from IRI to
>> >> > something with a getIRIString method, and would do the `new
>> >> > IRIWrapper`. But because this is also a value class (`AnyVal`) then no
>> >> > object would actually be allocated. It's basically free.
>> >> >
>> >> >>
>> >> >>
>> >> >>> If you look at what I did, you have a *direct* translation of the
>> >> >>> existing interfaces+methods+factory into simple functions.
>> >> >>
>> >> >> Yes, but done in Scala. Can I see a suggestion to the changes of the
>> >> >> current CommonsRDF Java interfaces - in Java?
>> >> >
>> >> > No the gist is in Java and uses the same function names.
>> >> >
>> >> >>
>> >> >>
>> >> >>> * the Java interfaces becomes abstract types
>> >> >>
>> >> >> Java interfaces are abstract types.
>> >> >
>> >> > Java interfaces provide some abstraction (subtype polymorphism). Types
>> >> > are compile-time information. At runtime, you see a reified version of
>> >> > the type, as an interface or as a class (and module type erasure).
>> >> > That is why Java interfaces are not really abstract types.
>> >> >
>> >> >> Do you mean generics?
>> >> >
>> >> > Yes.
>> >> >
>> >> >>  Generics of which class/interface?
>> >> >
>> >> > Of the RDF interface in the gist [1].
>> >> >
>> >> > [1]
>> >> https://gist.github.com/betehess/8983dbff2c3e89f9dadb#file-rdf-java-L10
>> >> >
>> >> >> Not all Commons RDF clients are expected to interface via
>> >> >> RDFTermFactory. In fact many use-cases don't need it at all.
>> >> >>
>> >> >>
>> >> >>> * the methods on those interfaces become functions on the abstract
>> >> types
>> >> >>> * the methods on the interfaces in the factory becomes simple
>> >> >>> functions on the abstract types
>> >> >>> * operating on a node happens with a visitor (as in visitor pattern)
>> >> >>> implemented as the `visit` function, taking 3 functions for the 3
>> >> >>> possible cases (I believe the current API asks for checking the
>> class
>> >> >>> at runtime...)
>> >> >>
>> >> >> This is too much at an abstract (!) level for me to visualize as
>> we're
>> >> >> clashing programming languages here.. could you detail how this would
>> >> >> look in a set of *.java files? Feel free to raise it as a pull
>> request
>> >> >> or similar, even if it's very draft-like. :)
>> >> >
>> >> > I can transform my gist into a real project. I will need a couple of
>> >> > days to find the time.
>> >> >
>> >> >>
>> >> >>
>> >> >>> Now, let's say I am implementing a Turtle parser. The only thing I
>> >> >>> care about is how I can [use case 1] create/inject elements into
>> some
>> >> >>> existing RDF model. If I am writing a Turtle serializer, I only care
>> >> >>> about how to [use case 2] traverse that type hierarchy. In none of
>> >> >>> those cases did I care about having the types defined in the
>> >> >>> class/interface hierarchy and I want anybody to use their own RDF
>> >> >>> model.
>> >> >>
>> >> >> Yes. And with the current take of Commons RDF, the Turtle parser is
>> free
>> >> >> to return its own instances of RDFTerm interfaces, which any Commons
>> RDF
>> >> >> consuming client will be able to use as-is, e.g. pass to their own
>> >> >> Graph implementation.
>> >> >
>> >> > And here is what people will end up doing:
>> >> >
>> >> > ```
>> >> > Graph graph = JenaTurtleParser.parse(input);
>> >> > com.hp.hpl.jena.graph.Graph jenaGraph =
>> >> (com.hp.hpl.jena.graph.Graph)graph;
>> >> > ```
>> >> >
>> >> > Many will not want to see the common interface but the actual subtype.
>> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >>> class TurtleParser<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI,
>> >> >>> BlankNode, Literal> {
>> >> >>>   RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode,
>> Literal>
>> >> rdf
>> >> >>>   Graph parse(String input) { /* can call rdf.createLiteral("foo"),
>> or
>> >> >>> anything in rdf.* */ }
>> >> >>> }
>> >> >>
>> >> >> I think the <brackets> speak for themselves here :-(
>> >> >>
>> >> >>
>> >> >>
>> >> >>> "Small" remark: I still don't think that `createBlankNode(String)`
>> >> >>> belongs to the RDF model. I would really like to see a use case that
>> >> >>> shows why it has to be present.
>> >> >>
>> >> >> This is a valid point of view which I think you should raise
>> >> >> as a new Jira issue. We did argue that it is not part of the
>> >> >> RDF model, but it is still a practically very useful feature,
>> >> >
>> >> > "useful feature" --> this is where I would like to see a motivating
>> >> > use case. Then we can discus how useful a feature it is, or how much
>> >> > of a problem it can be.
>> >> >
>> >> >> however it has generated many contention points in the past
>> >> >> as it touches on state and uniqueness.
>> >> >>
>> >> >>
>> >> >> See also this discussion about the need (or not) for
>> >> >> exposing .uniqueReference()
>> >> >
>> >> > I am all in favor or `uniqueReference`. That is how the invariants on
>> >> > the blank node can be achieved.
>> >> >
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/COMMONSRDF-13
>> >> >>
>> >> >>
>> >> >>
>> >> >>> Finally, I will admit that writing all those types parameters can
>> be a
>> >> >>> bit cumbersome, even if it happens only in a very few places (as a
>> >> >>> user: only once when you build what you need e.g. a Turtle parser).
>> >> >>> But please let's not sacrifice correctness and functionality to (a
>> >> >>> little) convenience...
>> >> >>
>> >> >> Well, if those would be exposed to any client of the Commons RDF API
>> I
>> >> >> fear we would see very little uptake..
>> >> >
>> >> > How so?
>> >> >
>> >> >> If they are hidden inside some upper/inner interface that is not
>> >> >> exposed otherwise, it is not so bad.
>> >> >
>> >> > Yes, you can always do that.
>> >> >
>> >> > Alexandre
>> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Stian Soiland-Reyes
>> >> >> Apache Taverna (incubating), Apache Commons RDF (incubating)
>> >> >> http://orcid.org/0000-0001-9842-9718
>> >>
>> >
>> >
>> >
>> > --
>> > Sergio Fernández
>> > Partner Technology Manager
>> > Redlink GmbH
>> > m: +43 6602747925
>> > e: [email protected]
>> > w: http://redlink.co
>>
>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: [email protected]
> w: http://redlink.co

Re: General questions

Reply via email to