On 07/05/15 00:24, Stian Soiland-Reyes wrote:
`RDFTermFactory` is stateful just to accommodate
>`createBlankNode(String)`. It's stateless otherwise. This is really an
>issue for banana-rdf as everything is defined as pure function (the
>output only depends on the input).
It does not need to be stateful.

In simple we implemented this using a final UUID "salt" that is
created per instance of the factory. Do you consider this state?

https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51


This is then used by
https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
as part of a hashing to generate the new uniqueReference().  Thus a
second call on the same factory with the same name will be hashed with
the same salt, and produce the same uniqueReference(), which makes the
second BlankNode equal to the first.


But you can achieve the contract by other non-stateful means, for
instance a random UUID that is static final (and hence no state at all
per factory instance), and you can create a uniqueReference() by
concatenating that UUID with the System.identityHashCode() of the
factory and concatenate the provided name.


Also you are not required to implement createBlankNode(String) - you
can simply throw UnsupportedOperationException and only support
createBlankNode().


This should probably be noted in the (yet so far just imagined)
Implementors Guide on the Commons RDF website.

https://issues.apache.org/jira/browse/COMMONSRDF-24

To add experience about createBlankNode(String): this string is not the unique reference.

This is a purely practical matter that arises when parsing at scale. The external Map<String, BlankNode> approach fails because that Map grows huge. N-Triples is worse because it does not have unlabelled blank nodes i.e. [].

When the Map style mapping has provided in the parser, there have been reports of parsing taking up too much space, coupled with the fact that Java's HashMap isn't very good at going from small to verge large (significant internal resizing, and also GC issues).

To avoid that, the factory, having direct access to the BlankNode implementation can do better using a "salt+combine with the string" approach. No growing space is needed for the syntax label tracking.

The other design I know of is to expose a factory operation to make a blank node from a unique reference. On balance, people seemed to prefer the current design.

It is a practical consideration - not a requirement of RDF.

        Andy

Reply via email to