I would also be happy with the createBlankNode(UUID) approach, which bypasses the issue of state, but hammers down the uniqueness constraint.
Assuming there is also a UUID uniqueReference() returned (or a contract-way to ascertain the String actually is a UUID, e.g. requiring a "urn:uuid:" prefix) then you can consistently "clone" foreign BlankNode impementations if you for some reason need to do so (there's been some discussion is this is really needer or not - see https://issues.apache.org/jira/browse/COMMONSRDF-15). (We can still show the hashing pattern that results in a String-derived UUID in the user guide.) I am however not quite buying into the argument of why we need the factory to be "fully stateless", except that this is the clean and "functional way" in Scala. As a hobby Clojure user I would welcome functional alternatives like add(Triple) that returns a new Graph - but I don't expect Commons-RDF to necessarily be able to play along with those paradigms at the same time as it is a useful "bog standard" object-oriented Java API. But perhaps this is relevant to the https://issues.apache.org/jira/browse/COMMONSRDF-20 discussions as there would be one argument less for needing multiple factory instances. (You might still want multiple if they are directly related to the underlying storage mechanism, but then they would also need to be configured, thus instantiated through other means like an explicit constructor) I think adjustments for Scala are welcome, but should be without causing too much a cost for Java users (implementers can take a tiny bit more pain) - so I think createBlankNode(UUID) is a very clean approach, while your suggested interface RDF<Graph, Triple, RDFTerm, BlankNodeOrIRI, IRI, BlankNode, Literal> looks very contrived in Java and potentially difficult to use. Alexandre, your gist approach is functionally clean - in a Clojure implementation it would also be natural to take shortcuts straight to String/URI/UUID like that. ..but I am not sure I understand the implications for Java-land. Could you perhaps draft out what you suggest the Commons-RDF interfaces would look like when defined in Java? On 7 May 2015 at 21:03, Andy Seaborne <[email protected]> wrote: > On 07/05/15 20:04, Alexandre Bertails wrote: >> >> Hi Andy, >> >> On Thu, May 7, 2015 at 3:27 AM, Andy Seaborne <[email protected]> wrote: >>> >>> On 07/05/15 00:24, Stian Soiland-Reyes wrote: >>>>> >>>>> >>>>> `RDFTermFactory` is stateful just to accommodate >>>>>> >>>>>> `createBlankNode(String)`. It's stateless otherwise. This is really an >>>>>> issue for banana-rdf as everything is defined as pure function (the >>>>>> output only depends on the input). >>>> >>>> >>>> It does not need to be stateful. >>>> >>>> In simple we implemented this using a final UUID "salt" that is >>>> created per instance of the factory. Do you consider this state? >>>> >>>> >>>> >>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51 >>>> >>>> >>>> This is then used by >>>> >>>> >>>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41 >>>> as part of a hashing to generate the new uniqueReference(). Thus a >>>> second call on the same factory with the same name will be hashed with >>>> the same salt, and produce the same uniqueReference(), which makes the >>>> second BlankNode equal to the first. >>>> >>>> >>>> But you can achieve the contract by other non-stateful means, for >>>> instance a random UUID that is static final (and hence no state at all >>>> per factory instance), and you can create a uniqueReference() by >>>> concatenating that UUID with the System.identityHashCode() of the >>>> factory and concatenate the provided name. >>>> >>>> >>>> Also you are not required to implement createBlankNode(String) - you >>>> can simply throw UnsupportedOperationException and only support >>>> createBlankNode(). >>>> >>>> >>>> This should probably be noted in the (yet so far just imagined) >>>> Implementors Guide on the Commons RDF website. >>>> >>>> https://issues.apache.org/jira/browse/COMMONSRDF-24 >>> >>> >>> >>> To add experience about createBlankNode(String): this string is not the >>> unique reference. >>> >>> This is a purely practical matter that arises when parsing at scale. The >>> external Map<String, BlankNode> approach fails because that Map grows >>> huge. >>> N-Triples is worse because it does not have unlabelled blank nodes i.e. >>> []. >>> >>> When the Map style mapping has provided in the parser, there have been >>> reports of parsing taking up too much space, coupled with the fact that >>> Java's HashMap isn't very good at going from small to verge large >>> (significant internal resizing, and also GC issues). >>> >>> To avoid that, the factory, having direct access to the BlankNode >>> implementation can do better using a "salt+combine with the string" >>> approach. No growing space is needed for the syntax label tracking. >>> >>> The other design I know of is to expose a factory operation to make a >>> blank >>> node from a unique reference. On balance, people seemed to prefer the >>> current design. >>> >>> It is a practical consideration - not a requirement of RDF. >> >> >> I agree. I actually do not care if it's a `java.util.Map` or something >> else. It's just "something" that gives you a `BlankNode` from a label, >> consistently. It could be an actual Map, or something else based on >> UUID, it doesn't matter. What matters is that there is an implicit >> state, and I argue that this is not a requirement. Passing that >> "something" as an argument, and returning the new state along with the >> `BlankNode` (whether it's stateful of stateless), would achieve your >> requirements the same way. > > > Function<String, BlankNode> > > If I follow your argument, that would require either state itself elsewhere > or private access to make blank nodes reproducibly because createBlankNode() > is the only factory way to create a bnode. > > So either state still exists, just moved, and if this is a /commons/RDF we > have to define that. Or there is another route to making bNodes which > defeats the purpose of the factory as it currently is. > > A stateless factory version is createBlankNode(UUID) but that idea didn't > get much traction (I'd be happy with it - at the abstract synatx level, a > 1-1 correspondence with UUIDs is just the "arbitrary set" the spec talks > about). > > > Actually, better would be String in UUID format or at least align with > uniqueReference - details. > >> >> I actually think that `createBlankNode(String)` *should not* belong to >> the factory interface. That is something external to the factory in >> its essence, which is there just because it makes things "more >> convenient". For example, a parser would have it's own state anyway, >> whether it lives *in* the factory or *outside* of it. > > > The parser does not have long term state. See Jena RIOT - they emit a > stream of triples. No state over the run. > > (except RDF/XML because of the "no reuse" of bnodeid but let's skip over > RDF/XML :-). > > Andy > >> >> Alexandre >> >>> >>> Andy > > -- Stian Soiland-Reyes Apache Taverna (incubating), Apache Commons RDF (incubating) http://orcid.org/0000-0001-9842-9718
