On 07/05/15 20:04, Alexandre Bertails wrote:
Hi Andy,

On Thu, May 7, 2015 at 3:27 AM, Andy Seaborne <[email protected]> wrote:
On 07/05/15 00:24, Stian Soiland-Reyes wrote:

`RDFTermFactory` is stateful just to accommodate
`createBlankNode(String)`. It's stateless otherwise. This is really an
issue for banana-rdf as everything is defined as pure function (the
output only depends on the input).

It does not need to be stateful.

In simple we implemented this using a final UUID "salt" that is
created per instance of the factory. Do you consider this state?


https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51


This is then used by

https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
as part of a hashing to generate the new uniqueReference().  Thus a
second call on the same factory with the same name will be hashed with
the same salt, and produce the same uniqueReference(), which makes the
second BlankNode equal to the first.


But you can achieve the contract by other non-stateful means, for
instance a random UUID that is static final (and hence no state at all
per factory instance), and you can create a uniqueReference() by
concatenating that UUID with the System.identityHashCode() of the
factory and concatenate the provided name.


Also you are not required to implement createBlankNode(String) - you
can simply throw UnsupportedOperationException and only support
createBlankNode().


This should probably be noted in the (yet so far just imagined)
Implementors Guide on the Commons RDF website.

https://issues.apache.org/jira/browse/COMMONSRDF-24


To add experience about createBlankNode(String): this string is not the
unique reference.

This is a purely practical matter that arises when parsing at scale. The
external Map<String, BlankNode> approach fails because that Map grows huge.
N-Triples is worse because it does not have unlabelled blank nodes i.e. [].

When the Map style mapping has provided in the parser, there have been
reports of parsing taking up too much space, coupled with the fact that
Java's HashMap isn't very good at going from small to verge large
(significant internal resizing, and also GC issues).

To avoid that, the factory, having direct access to the BlankNode
implementation can do better using a "salt+combine with the string"
approach.  No growing space is needed for the syntax label tracking.

The other design I know of is to expose a factory operation to make a blank
node from a unique reference.  On balance, people seemed to prefer the
current design.

It is a practical consideration - not a requirement of RDF.

I agree. I actually do not care if it's a `java.util.Map` or something
else. It's just "something" that gives you a `BlankNode` from a label,
consistently. It could be an actual Map, or something else based on
UUID, it doesn't matter. What matters is that there is an implicit
state, and I argue that this is not a requirement. Passing that
"something" as an argument, and returning the new state along with the
`BlankNode` (whether it's stateful of stateless), would achieve your
requirements the same way.

Function<String, BlankNode>

If I follow your argument, that would require either state itself elsewhere or private access to make blank nodes reproducibly because createBlankNode() is the only factory way to create a bnode.

So either state still exists, just moved, and if this is a /commons/RDF we have to define that. Or there is another route to making bNodes which defeats the purpose of the factory as it currently is.

A stateless factory version is createBlankNode(UUID) but that idea didn't get much traction (I'd be happy with it - at the abstract synatx level, a 1-1 correspondence with UUIDs is just the "arbitrary set" the spec talks about).


Actually, better would be String in UUID format or at least align with uniqueReference - details.


I actually think that `createBlankNode(String)` *should not* belong to
the factory interface. That is something external to the factory in
its essence, which is there just because it makes things "more
convenient". For example, a parser would have it's own state anyway,
whether it lives *in* the factory or *outside* of it.

The parser does not have long term state. See Jena RIOT - they emit a stream of triples. No state over the run.

(except RDF/XML because of the "no reuse" of bnodeid but let's skip over RDF/XML :-).

        Andy


Alexandre


         Andy

Reply via email to