Sergio, The approach is different. A "patch" against the current codebase would remove most of the interfaces.
I suggest that you try to understand what's going on in the code, after you read the other messages in that thread. Then if there is interest, I can work on a real patch. Alexandre On Tue, May 12, 2015 at 11:25 PM, Sergio Fernández <[email protected]> wrote: > I'd say if you'd be much more valuable to see a patch about your proposal > that a quick hack from scratch. > You can fork our github mirror: > https://github.com/apache/incubator-commonsrdf > > On Wed, May 13, 2015 at 8:01 AM, Alexandre Bertails <[email protected]> > wrote: > >> On Tue, May 12, 2015 at 10:21 PM, Sergio Fernández <[email protected]> >> wrote: >> > Alexandre, >> > >> > git clone >> > https://[email protected]/repos/asf/incubator-commonsrdf.git >> > commonsrdf >> > >> > The incubator prefix in the name is to keep clear we're still not fully >> > endorsed by the ASF. I know it's a bit inconvenient, specially in later >> > phases when we'd get rid of that, but is part of the incubator process. >> >> Thanks! >> >> I have hacked something quick-and-dirty and made it available at [1]. >> >> Quick overview of the sub-packages: >> * `api`: just the RDF interface, and the interfaces from commons-rdf >> are moved under `concrete` >> * `concrete`: shows how to implement RDF with the interfaces approach >> * `simple`: a complete example adapted from commons-rdf >> * `classless`: a (almost) complete example which does not rely on >> shared interfaces >> * `turtle`: a example of how to rely on the RDF interface >> >> Feel free to ask questions. >> >> Alexandre >> >> [1] https://github.com/betehess/free-rdf >> >> > >> > >> > >> > On Tue, May 12, 2015 at 6:45 PM, Alexandre Bertails < >> [email protected]> >> > wrote: >> > >> >> Stian, >> >> >> >> It sounds stupid but I do not understand where the code actually lives. >> >> >> >> I have tried >> >> >> >> ``` >> >> git clone https://git-wip-us.apache.org/repos/asf/commons-rdf.git >> >> ``` >> >> >> >> and >> >> >> >> ``` >> >> git clone git://git.apache.org/commons-rdf.git >> >> ``` >> >> >> >> but both tell me that I "appear to have cloned an empty repository." >> >> The github repo is empty as well. >> >> >> >> Can somebody please give me the right URI? Sorry if I miss that in the >> >> documentation, but I did look there and couldn't find the answer :-/ >> >> >> >> Alexandre >> >> >> >> >> >> On Tue, May 12, 2015 at 8:41 AM, Alexandre Bertails >> >> <[email protected]> wrote: >> >> > Hi Stian, >> >> > >> >> > On Tue, May 12, 2015 at 7:35 AM, Stian Soiland-Reyes < >> [email protected]> >> >> wrote: >> >> >> On 12 May 2015 at 06:20, Alexandre Bertails <[email protected]> >> >> wrote: >> >> >> >> >> >>> I actually didn't understand that we were discussing a >> >> >>> `createBlankNode(UUID)`. I think we just need to be able to create a >> >> >>> fresh blank node. >> >> >> >> >> >> That is what createBlankNode() does. >> >> >> >> >> >> Is your proposal to simply remove createBlankNode(String)? >> >> > >> >> > As it is today, yes. Because its contract implies some kind of shared >> >> state. >> >> > >> >> > But we have identified a use-case where the blank node can remember in >> >> > which context it was generated e.g. the blank node label at parsing >> >> > time. >> >> > >> >> >>> Requiring the caller to provide an explicit UUID >> >> >>> means that the freshness is happening *outside* of the factory, so I >> >> >>> don't see the point. >> >> >> >> >> >> Well, you wanted to pass in the uniqueness..? You can pass it as a >> >> >> String (as of today), or, loosely suggested, by restricting this to a >> >> >> UUID (which would require clients to think about this very common >> >> >> mapping/hashing). >> >> > >> >> > No, the uniqueness must happen in `createBlankNode()`. That's how you >> >> > can enforce the invariant. >> >> > >> >> >>> Also, it's forcing the strategy (UUID), which >> >> >>> might not be the best one for everybody, e.g. UUID is known to be >> >> >>> slow, at least for some notion of slow, and that could become a >> >> >> >> >> >> There are several variations of UUID, you are free to use a >> >> >> timestamp one that is rather fast to make, SHA-1 is not known to be >> slow >> >> >> either, so version 5 hashes are also fast. >> >> > >> >> > commons-rdf should leave that choice open. >> >> > >> >> >> But we agreed that UUID only might be a bit strict for some >> >> implementations, >> >> >> which meant that uniqueReference() can return any unique string.. so >> if >> >> it >> >> >> considered >> >> >> >> >> >> app=97975c0b-62c1-42c9-b2a9-e87948e4a46e ip=84.92.48.26 uid=1000 >> >> >> pid=292 name=fred >> >> >> >> >> >> to be a unique string (with hard-coded >> >> 97975c0b-62c1-42c9-b2a9-e87948e4a46e >> >> >> in case someone else comes up with a similar scheme), >> >> >> and didn't mind leaking all that vulnerability data, then that would >> be >> >> a >> >> >> compliant uniqueReference(). >> >> >> >> >> >> >> >> >> >> >> >>> I am not arguing for stateless vs stateful. I am just pointing at >> some >> >> >>> design issues which do not allow it. Currently, there is just no way >> >> >>> for an immutable implementation to be used with such a factory. >> >> >> >> >> >> I am not sure what is the extent of "immutable" here. I'll assume it >> >> >> just means that all fields are final, not >> >> >> that the object is not allowed to have any field at all. >> >> > >> >> > Being final just means that the reference won't be updated, but its >> >> > state can still be updated. So to be immutable, you also need the >> >> > final references to be immutable themselves. >> >> > >> >> >> You are free to >> >> >> create RDFTermFactory as you please, so you can simply do it like >> this: >> >> >> >> >> >> public class ImmutableRDFTermFactory implements RDFTermFactory { >> >> >> private final UUID salt; >> >> >> public ImmutableRDFTermFactory(UUID salt) { >> >> >> this.salt = salt; >> >> >> } >> >> >> public BlankNode createBlankNode() { >> >> >> return new BlankNodeImpl(salt); >> >> >> } >> >> >> public BlankNode createBlankNode(String name) { >> >> >> return new BlankNodeImpl(salt, name); >> >> >> } >> >> >> / .. >> >> >> } >> >> >> >> >> >> public class BlankNodeImpl implements BlankNode { >> >> >> >> >> >> private static void unique(UUID salt) { >> >> >> Instant now = Clock.systemUTC().instant(); >> >> >> return salt.toString() + System.identityHashCode(this) + >> >> >> now.getEpochSecond() + now.getNano() + >> Thread.currentThread().getId(); >> >> >> } >> >> >> >> >> >> private final String uniqueReference; >> >> >> public BlankNodeImpl(UUID salt, String name) { >> >> >> uniqueReference = salt.toString() + name; >> >> >> } >> >> >> public BlankNodeImpl(UUID salt) { >> >> >> uniqueReference = salt.toString() + >> System.identityHashCode(this) >> >> >> + new Date().; >> >> >> } >> >> >> } >> >> > >> >> > This is not immutable because of the shared state. >> >> > >> >> >> Here there is no hidden mutability in AtomicLong or within >> >> >> java.util.UUID's SecureRandom implementation's internal state. I >> guess >> >> >> you would not be happy with those either? >> >> >> >> >> >> The clock is obviously mutable - but as a device rather than a memory >> >> state. >> >> > >> >> > There is no "but" in the immutable world :-) >> >> > >> >> >>> Having `add` returning a `Graph` does not mean that `Graph` is >> >> >>> immutable. It just means that it *enables* `Graph` to be immutable. >> >> >> >> >> >> There is nothing stopping an immutable Graph from having an >> additional >> >> >> method that does this. >> >> > >> >> > Now I am the one asking for some code, because I don't see how that'd >> >> work :-p >> >> > >> >> > As I said in a previous, you can wrap an immutable Graph in a new >> >> > object with a mutable reference to that graph, but, well, please let's >> >> > avoid having to do that... >> >> > >> >> >> For some methods, like builders, returning the mutated state is good >> >> practice. >> >> > >> >> > When using persistent datastructures, a builder is not an option. >> >> > >> >> > There are areas where you do not want to go back to the mutable >> >> > version. It happens everywhere in banana-rdf e.g. the RDF DSL, the >> >> > RDF/class mapper, etc. Just because we need to compose graphs without >> >> > risking to modify an existing one. >> >> > >> >> >> It has been suggested earlier to return bool on add() to be >> compatible >> >> >> with Collection, but we were not all too happy with that as it might >> >> >> be difficult/expensive to know if the graph was actually mutated or >> >> >> not (e.g. you insert the same triple twice, but the store doesn't >> >> >> bother checking if the triple existed). >> >> > >> >> > Returning `bool` has very little value from my perspective. >> >> > >> >> >> >> >> >> See >> >> >> https://issues.apache.org/jira/browse/COMMONSRDF-17 >> >> >> https://github.com/commons-rdf/commons-rdf/issues/27 >> >> >> https://github.com/commons-rdf/commons-rdf/issues/46 >> >> >> >> >> >> >> >> >> So your suggestion is for the mutability methods to return the >> mutated >> >> >> object (which may or may not be the original instance). I think this >> >> >> could be an interesting take for discussions - could you raise this >> as >> >> >> a separate Jira issue? >> >> > >> >> > Yes, that'd be the way to go. >> >> > >> >> > But I would prefer to see how much interest in the general approach >> >> > there is before opening too many issues. >> >> > >> >> >> >> >> >> >> >> >>> Well, Scala is just a language. Immutability and referential >> >> >>> transparency, are just principles, but they are becoming more and >> more >> >> >>> important in many areas (Spark, concurrency, etc.). >> >> >> >> >> >> Agreed, also for distributed areas like Hadoop. >> >> > >> >> > There are *many* areas where accommodating immutable graphs has become >> >> > important. >> >> > >> >> >>> There is no shortcut at all. The RDF model only resolves around some >> >> >>> types (Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, >> >> >>> Literal) which can be left abstract, as opposed to being concrete >> when >> >> >>> using Java's interfaces. (it's "concrete" in the sense it's using >> >> >>> nominal subtyping) >> >> >> >> >> >> Well, I still don't see how a java.util.String will work with Java >> >> >> code that expects to be able to call .getIRIString(). Would >> >> >> Scala generate proxies on the fly? Or would it need to call >> >> >> .getIRIString() "elsewhere"? >> >> > >> >> > It's like monkey patching, just in a controlled and type safe way: >> >> > >> >> > ``` >> >> > val rdf: RDF = ??? >> >> > >> >> > implicit class IRIWrapper(val iri: IRI) extends AnyVal { >> >> > def getIRIString(): String = rdf.getIRIString(iri) >> >> > } >> >> > >> >> > val iri: IRI = rdf.createIRI("http://example.com") >> >> > assert(rdf.getIRIString(iri) == iri.getIRIString()) >> >> > ``` >> >> > >> >> > Scala would find that there is an implicit conversion from IRI to >> >> > something with a getIRIString method, and would do the `new >> >> > IRIWrapper`. But because this is also a value class (`AnyVal`) then no >> >> > object would actually be allocated. It's basically free. >> >> > >> >> >> >> >> >> >> >> >>> If you look at what I did, you have a *direct* translation of the >> >> >>> existing interfaces+methods+factory into simple functions. >> >> >> >> >> >> Yes, but done in Scala. Can I see a suggestion to the changes of the >> >> >> current CommonsRDF Java interfaces - in Java? >> >> > >> >> > No the gist is in Java and uses the same function names. >> >> > >> >> >> >> >> >> >> >> >>> * the Java interfaces becomes abstract types >> >> >> >> >> >> Java interfaces are abstract types. >> >> > >> >> > Java interfaces provide some abstraction (subtype polymorphism). Types >> >> > are compile-time information. At runtime, you see a reified version of >> >> > the type, as an interface or as a class (and module type erasure). >> >> > That is why Java interfaces are not really abstract types. >> >> > >> >> >> Do you mean generics? >> >> > >> >> > Yes. >> >> > >> >> >> Generics of which class/interface? >> >> > >> >> > Of the RDF interface in the gist [1]. >> >> > >> >> > [1] >> >> https://gist.github.com/betehess/8983dbff2c3e89f9dadb#file-rdf-java-L10 >> >> > >> >> >> Not all Commons RDF clients are expected to interface via >> >> >> RDFTermFactory. In fact many use-cases don't need it at all. >> >> >> >> >> >> >> >> >>> * the methods on those interfaces become functions on the abstract >> >> types >> >> >>> * the methods on the interfaces in the factory becomes simple >> >> >>> functions on the abstract types >> >> >>> * operating on a node happens with a visitor (as in visitor pattern) >> >> >>> implemented as the `visit` function, taking 3 functions for the 3 >> >> >>> possible cases (I believe the current API asks for checking the >> class >> >> >>> at runtime...) >> >> >> >> >> >> This is too much at an abstract (!) level for me to visualize as >> we're >> >> >> clashing programming languages here.. could you detail how this would >> >> >> look in a set of *.java files? Feel free to raise it as a pull >> request >> >> >> or similar, even if it's very draft-like. :) >> >> > >> >> > I can transform my gist into a real project. I will need a couple of >> >> > days to find the time. >> >> > >> >> >> >> >> >> >> >> >>> Now, let's say I am implementing a Turtle parser. The only thing I >> >> >>> care about is how I can [use case 1] create/inject elements into >> some >> >> >>> existing RDF model. If I am writing a Turtle serializer, I only care >> >> >>> about how to [use case 2] traverse that type hierarchy. In none of >> >> >>> those cases did I care about having the types defined in the >> >> >>> class/interface hierarchy and I want anybody to use their own RDF >> >> >>> model. >> >> >> >> >> >> Yes. And with the current take of Commons RDF, the Turtle parser is >> free >> >> >> to return its own instances of RDFTerm interfaces, which any Commons >> RDF >> >> >> consuming client will be able to use as-is, e.g. pass to their own >> >> >> Graph implementation. >> >> > >> >> > And here is what people will end up doing: >> >> > >> >> > ``` >> >> > Graph graph = JenaTurtleParser.parse(input); >> >> > com.hp.hpl.jena.graph.Graph jenaGraph = >> >> (com.hp.hpl.jena.graph.Graph)graph; >> >> > ``` >> >> > >> >> > Many will not want to see the common interface but the actual subtype. >> >> > >> >> >> >> >> >> >> >> >> >> >> >>> class TurtleParser<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, >> >> >>> BlankNode, Literal> { >> >> >>> RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, >> Literal> >> >> rdf >> >> >>> Graph parse(String input) { /* can call rdf.createLiteral("foo"), >> or >> >> >>> anything in rdf.* */ } >> >> >>> } >> >> >> >> >> >> I think the <brackets> speak for themselves here :-( >> >> >> >> >> >> >> >> >> >> >> >>> "Small" remark: I still don't think that `createBlankNode(String)` >> >> >>> belongs to the RDF model. I would really like to see a use case that >> >> >>> shows why it has to be present. >> >> >> >> >> >> This is a valid point of view which I think you should raise >> >> >> as a new Jira issue. We did argue that it is not part of the >> >> >> RDF model, but it is still a practically very useful feature, >> >> > >> >> > "useful feature" --> this is where I would like to see a motivating >> >> > use case. Then we can discus how useful a feature it is, or how much >> >> > of a problem it can be. >> >> > >> >> >> however it has generated many contention points in the past >> >> >> as it touches on state and uniqueness. >> >> >> >> >> >> >> >> >> See also this discussion about the need (or not) for >> >> >> exposing .uniqueReference() >> >> > >> >> > I am all in favor or `uniqueReference`. That is how the invariants on >> >> > the blank node can be achieved. >> >> > >> >> >> >> >> >> https://issues.apache.org/jira/browse/COMMONSRDF-13 >> >> >> >> >> >> >> >> >> >> >> >>> Finally, I will admit that writing all those types parameters can >> be a >> >> >>> bit cumbersome, even if it happens only in a very few places (as a >> >> >>> user: only once when you build what you need e.g. a Turtle parser). >> >> >>> But please let's not sacrifice correctness and functionality to (a >> >> >>> little) convenience... >> >> >> >> >> >> Well, if those would be exposed to any client of the Commons RDF API >> I >> >> >> fear we would see very little uptake.. >> >> > >> >> > How so? >> >> > >> >> >> If they are hidden inside some upper/inner interface that is not >> >> >> exposed otherwise, it is not so bad. >> >> > >> >> > Yes, you can always do that. >> >> > >> >> > Alexandre >> >> > >> >> >> >> >> >> >> >> >> -- >> >> >> Stian Soiland-Reyes >> >> >> Apache Taverna (incubating), Apache Commons RDF (incubating) >> >> >> http://orcid.org/0000-0001-9842-9718 >> >> >> > >> > >> > >> > -- >> > Sergio Fernández >> > Partner Technology Manager >> > Redlink GmbH >> > m: +43 6602747925 >> > e: [email protected] >> > w: http://redlink.co >> > > > > -- > Sergio Fernández > Partner Technology Manager > Redlink GmbH > m: +43 6602747925 > e: [email protected] > w: http://redlink.co
