Hi Andy, On Thu, May 7, 2015 at 3:27 AM, Andy Seaborne <[email protected]> wrote: > On 07/05/15 00:24, Stian Soiland-Reyes wrote: >>> >>> `RDFTermFactory` is stateful just to accommodate >>> >`createBlankNode(String)`. It's stateless otherwise. This is really an >>> >issue for banana-rdf as everything is defined as pure function (the >>> >output only depends on the input). >> >> It does not need to be stateful. >> >> In simple we implemented this using a final UUID "salt" that is >> created per instance of the factory. Do you consider this state? >> >> >> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51 >> >> >> This is then used by >> >> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41 >> as part of a hashing to generate the new uniqueReference(). Thus a >> second call on the same factory with the same name will be hashed with >> the same salt, and produce the same uniqueReference(), which makes the >> second BlankNode equal to the first. >> >> >> But you can achieve the contract by other non-stateful means, for >> instance a random UUID that is static final (and hence no state at all >> per factory instance), and you can create a uniqueReference() by >> concatenating that UUID with the System.identityHashCode() of the >> factory and concatenate the provided name. >> >> >> Also you are not required to implement createBlankNode(String) - you >> can simply throw UnsupportedOperationException and only support >> createBlankNode(). >> >> >> This should probably be noted in the (yet so far just imagined) >> Implementors Guide on the Commons RDF website. >> >> https://issues.apache.org/jira/browse/COMMONSRDF-24 > > > To add experience about createBlankNode(String): this string is not the > unique reference. > > This is a purely practical matter that arises when parsing at scale. The > external Map<String, BlankNode> approach fails because that Map grows huge. > N-Triples is worse because it does not have unlabelled blank nodes i.e. []. > > When the Map style mapping has provided in the parser, there have been > reports of parsing taking up too much space, coupled with the fact that > Java's HashMap isn't very good at going from small to verge large > (significant internal resizing, and also GC issues). > > To avoid that, the factory, having direct access to the BlankNode > implementation can do better using a "salt+combine with the string" > approach. No growing space is needed for the syntax label tracking. > > The other design I know of is to expose a factory operation to make a blank > node from a unique reference. On balance, people seemed to prefer the > current design. > > It is a practical consideration - not a requirement of RDF.
I agree. I actually do not care if it's a `java.util.Map` or something else. It's just "something" that gives you a `BlankNode` from a label, consistently. It could be an actual Map, or something else based on UUID, it doesn't matter. What matters is that there is an implicit state, and I argue that this is not a requirement. Passing that "something" as an argument, and returning the new state along with the `BlankNode` (whether it's stateful of stateless), would achieve your requirements the same way. I actually think that `createBlankNode(String)` *should not* belong to the factory interface. That is something external to the factory in its essence, which is there just because it makes things "more convenient". For example, a parser would have it's own state anyway, whether it lives *in* the factory or *outside* of it. Alexandre > > Andy
