Alexandre, git clone https://[email protected]/repos/asf/incubator-commonsrdf.git commonsrdf
The incubator prefix in the name is to keep clear we're still not fully endorsed by the ASF. I know it's a bit inconvenient, specially in later phases when we'd get rid of that, but is part of the incubator process. On Tue, May 12, 2015 at 6:45 PM, Alexandre Bertails <[email protected]> wrote: > Stian, > > It sounds stupid but I do not understand where the code actually lives. > > I have tried > > ``` > git clone https://git-wip-us.apache.org/repos/asf/commons-rdf.git > ``` > > and > > ``` > git clone git://git.apache.org/commons-rdf.git > ``` > > but both tell me that I "appear to have cloned an empty repository." > The github repo is empty as well. > > Can somebody please give me the right URI? Sorry if I miss that in the > documentation, but I did look there and couldn't find the answer :-/ > > Alexandre > > > On Tue, May 12, 2015 at 8:41 AM, Alexandre Bertails > <[email protected]> wrote: > > Hi Stian, > > > > On Tue, May 12, 2015 at 7:35 AM, Stian Soiland-Reyes <[email protected]> > wrote: > >> On 12 May 2015 at 06:20, Alexandre Bertails <[email protected]> > wrote: > >> > >>> I actually didn't understand that we were discussing a > >>> `createBlankNode(UUID)`. I think we just need to be able to create a > >>> fresh blank node. > >> > >> That is what createBlankNode() does. > >> > >> Is your proposal to simply remove createBlankNode(String)? > > > > As it is today, yes. Because its contract implies some kind of shared > state. > > > > But we have identified a use-case where the blank node can remember in > > which context it was generated e.g. the blank node label at parsing > > time. > > > >>> Requiring the caller to provide an explicit UUID > >>> means that the freshness is happening *outside* of the factory, so I > >>> don't see the point. > >> > >> Well, you wanted to pass in the uniqueness..? You can pass it as a > >> String (as of today), or, loosely suggested, by restricting this to a > >> UUID (which would require clients to think about this very common > >> mapping/hashing). > > > > No, the uniqueness must happen in `createBlankNode()`. That's how you > > can enforce the invariant. > > > >>> Also, it's forcing the strategy (UUID), which > >>> might not be the best one for everybody, e.g. UUID is known to be > >>> slow, at least for some notion of slow, and that could become a > >> > >> There are several variations of UUID, you are free to use a > >> timestamp one that is rather fast to make, SHA-1 is not known to be slow > >> either, so version 5 hashes are also fast. > > > > commons-rdf should leave that choice open. > > > >> But we agreed that UUID only might be a bit strict for some > implementations, > >> which meant that uniqueReference() can return any unique string.. so if > it > >> considered > >> > >> app=97975c0b-62c1-42c9-b2a9-e87948e4a46e ip=84.92.48.26 uid=1000 > >> pid=292 name=fred > >> > >> to be a unique string (with hard-coded > 97975c0b-62c1-42c9-b2a9-e87948e4a46e > >> in case someone else comes up with a similar scheme), > >> and didn't mind leaking all that vulnerability data, then that would be > a > >> compliant uniqueReference(). > >> > >> > >> > >>> I am not arguing for stateless vs stateful. I am just pointing at some > >>> design issues which do not allow it. Currently, there is just no way > >>> for an immutable implementation to be used with such a factory. > >> > >> I am not sure what is the extent of "immutable" here. I'll assume it > >> just means that all fields are final, not > >> that the object is not allowed to have any field at all. > > > > Being final just means that the reference won't be updated, but its > > state can still be updated. So to be immutable, you also need the > > final references to be immutable themselves. > > > >> You are free to > >> create RDFTermFactory as you please, so you can simply do it like this: > >> > >> public class ImmutableRDFTermFactory implements RDFTermFactory { > >> private final UUID salt; > >> public ImmutableRDFTermFactory(UUID salt) { > >> this.salt = salt; > >> } > >> public BlankNode createBlankNode() { > >> return new BlankNodeImpl(salt); > >> } > >> public BlankNode createBlankNode(String name) { > >> return new BlankNodeImpl(salt, name); > >> } > >> / .. > >> } > >> > >> public class BlankNodeImpl implements BlankNode { > >> > >> private static void unique(UUID salt) { > >> Instant now = Clock.systemUTC().instant(); > >> return salt.toString() + System.identityHashCode(this) + > >> now.getEpochSecond() + now.getNano() + Thread.currentThread().getId(); > >> } > >> > >> private final String uniqueReference; > >> public BlankNodeImpl(UUID salt, String name) { > >> uniqueReference = salt.toString() + name; > >> } > >> public BlankNodeImpl(UUID salt) { > >> uniqueReference = salt.toString() + System.identityHashCode(this) > >> + new Date().; > >> } > >> } > > > > This is not immutable because of the shared state. > > > >> Here there is no hidden mutability in AtomicLong or within > >> java.util.UUID's SecureRandom implementation's internal state. I guess > >> you would not be happy with those either? > >> > >> The clock is obviously mutable - but as a device rather than a memory > state. > > > > There is no "but" in the immutable world :-) > > > >>> Having `add` returning a `Graph` does not mean that `Graph` is > >>> immutable. It just means that it *enables* `Graph` to be immutable. > >> > >> There is nothing stopping an immutable Graph from having an additional > >> method that does this. > > > > Now I am the one asking for some code, because I don't see how that'd > work :-p > > > > As I said in a previous, you can wrap an immutable Graph in a new > > object with a mutable reference to that graph, but, well, please let's > > avoid having to do that... > > > >> For some methods, like builders, returning the mutated state is good > practice. > > > > When using persistent datastructures, a builder is not an option. > > > > There are areas where you do not want to go back to the mutable > > version. It happens everywhere in banana-rdf e.g. the RDF DSL, the > > RDF/class mapper, etc. Just because we need to compose graphs without > > risking to modify an existing one. > > > >> It has been suggested earlier to return bool on add() to be compatible > >> with Collection, but we were not all too happy with that as it might > >> be difficult/expensive to know if the graph was actually mutated or > >> not (e.g. you insert the same triple twice, but the store doesn't > >> bother checking if the triple existed). > > > > Returning `bool` has very little value from my perspective. > > > >> > >> See > >> https://issues.apache.org/jira/browse/COMMONSRDF-17 > >> https://github.com/commons-rdf/commons-rdf/issues/27 > >> https://github.com/commons-rdf/commons-rdf/issues/46 > >> > >> > >> So your suggestion is for the mutability methods to return the mutated > >> object (which may or may not be the original instance). I think this > >> could be an interesting take for discussions - could you raise this as > >> a separate Jira issue? > > > > Yes, that'd be the way to go. > > > > But I would prefer to see how much interest in the general approach > > there is before opening too many issues. > > > >> > >> > >>> Well, Scala is just a language. Immutability and referential > >>> transparency, are just principles, but they are becoming more and more > >>> important in many areas (Spark, concurrency, etc.). > >> > >> Agreed, also for distributed areas like Hadoop. > > > > There are *many* areas where accommodating immutable graphs has become > > important. > > > >>> There is no shortcut at all. The RDF model only resolves around some > >>> types (Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, > >>> Literal) which can be left abstract, as opposed to being concrete when > >>> using Java's interfaces. (it's "concrete" in the sense it's using > >>> nominal subtyping) > >> > >> Well, I still don't see how a java.util.String will work with Java > >> code that expects to be able to call .getIRIString(). Would > >> Scala generate proxies on the fly? Or would it need to call > >> .getIRIString() "elsewhere"? > > > > It's like monkey patching, just in a controlled and type safe way: > > > > ``` > > val rdf: RDF = ??? > > > > implicit class IRIWrapper(val iri: IRI) extends AnyVal { > > def getIRIString(): String = rdf.getIRIString(iri) > > } > > > > val iri: IRI = rdf.createIRI("http://example.com") > > assert(rdf.getIRIString(iri) == iri.getIRIString()) > > ``` > > > > Scala would find that there is an implicit conversion from IRI to > > something with a getIRIString method, and would do the `new > > IRIWrapper`. But because this is also a value class (`AnyVal`) then no > > object would actually be allocated. It's basically free. > > > >> > >> > >>> If you look at what I did, you have a *direct* translation of the > >>> existing interfaces+methods+factory into simple functions. > >> > >> Yes, but done in Scala. Can I see a suggestion to the changes of the > >> current CommonsRDF Java interfaces - in Java? > > > > No the gist is in Java and uses the same function names. > > > >> > >> > >>> * the Java interfaces becomes abstract types > >> > >> Java interfaces are abstract types. > > > > Java interfaces provide some abstraction (subtype polymorphism). Types > > are compile-time information. At runtime, you see a reified version of > > the type, as an interface or as a class (and module type erasure). > > That is why Java interfaces are not really abstract types. > > > >> Do you mean generics? > > > > Yes. > > > >> Generics of which class/interface? > > > > Of the RDF interface in the gist [1]. > > > > [1] > https://gist.github.com/betehess/8983dbff2c3e89f9dadb#file-rdf-java-L10 > > > >> Not all Commons RDF clients are expected to interface via > >> RDFTermFactory. In fact many use-cases don't need it at all. > >> > >> > >>> * the methods on those interfaces become functions on the abstract > types > >>> * the methods on the interfaces in the factory becomes simple > >>> functions on the abstract types > >>> * operating on a node happens with a visitor (as in visitor pattern) > >>> implemented as the `visit` function, taking 3 functions for the 3 > >>> possible cases (I believe the current API asks for checking the class > >>> at runtime...) > >> > >> This is too much at an abstract (!) level for me to visualize as we're > >> clashing programming languages here.. could you detail how this would > >> look in a set of *.java files? Feel free to raise it as a pull request > >> or similar, even if it's very draft-like. :) > > > > I can transform my gist into a real project. I will need a couple of > > days to find the time. > > > >> > >> > >>> Now, let's say I am implementing a Turtle parser. The only thing I > >>> care about is how I can [use case 1] create/inject elements into some > >>> existing RDF model. If I am writing a Turtle serializer, I only care > >>> about how to [use case 2] traverse that type hierarchy. In none of > >>> those cases did I care about having the types defined in the > >>> class/interface hierarchy and I want anybody to use their own RDF > >>> model. > >> > >> Yes. And with the current take of Commons RDF, the Turtle parser is free > >> to return its own instances of RDFTerm interfaces, which any Commons RDF > >> consuming client will be able to use as-is, e.g. pass to their own > >> Graph implementation. > > > > And here is what people will end up doing: > > > > ``` > > Graph graph = JenaTurtleParser.parse(input); > > com.hp.hpl.jena.graph.Graph jenaGraph = > (com.hp.hpl.jena.graph.Graph)graph; > > ``` > > > > Many will not want to see the common interface but the actual subtype. > > > >> > >> > >> > >>> class TurtleParser<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, > >>> BlankNode, Literal> { > >>> RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, Literal> > rdf > >>> Graph parse(String input) { /* can call rdf.createLiteral("foo"), or > >>> anything in rdf.* */ } > >>> } > >> > >> I think the <brackets> speak for themselves here :-( > >> > >> > >> > >>> "Small" remark: I still don't think that `createBlankNode(String)` > >>> belongs to the RDF model. I would really like to see a use case that > >>> shows why it has to be present. > >> > >> This is a valid point of view which I think you should raise > >> as a new Jira issue. We did argue that it is not part of the > >> RDF model, but it is still a practically very useful feature, > > > > "useful feature" --> this is where I would like to see a motivating > > use case. Then we can discus how useful a feature it is, or how much > > of a problem it can be. > > > >> however it has generated many contention points in the past > >> as it touches on state and uniqueness. > >> > >> > >> See also this discussion about the need (or not) for > >> exposing .uniqueReference() > > > > I am all in favor or `uniqueReference`. That is how the invariants on > > the blank node can be achieved. > > > >> > >> https://issues.apache.org/jira/browse/COMMONSRDF-13 > >> > >> > >> > >>> Finally, I will admit that writing all those types parameters can be a > >>> bit cumbersome, even if it happens only in a very few places (as a > >>> user: only once when you build what you need e.g. a Turtle parser). > >>> But please let's not sacrifice correctness and functionality to (a > >>> little) convenience... > >> > >> Well, if those would be exposed to any client of the Commons RDF API I > >> fear we would see very little uptake.. > > > > How so? > > > >> If they are hidden inside some upper/inner interface that is not > >> exposed otherwise, it is not so bad. > > > > Yes, you can always do that. > > > > Alexandre > > > >> > >> > >> -- > >> Stian Soiland-Reyes > >> Apache Taverna (incubating), Apache Commons RDF (incubating) > >> http://orcid.org/0000-0001-9842-9718 > -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 6602747925 e: [email protected] w: http://redlink.co
