Hi Andy,

On Thu, May 7, 2015 at 3:27 AM, Andy Seaborne <[email protected]> wrote:
> On 07/05/15 00:24, Stian Soiland-Reyes wrote:
>>>
>>> `RDFTermFactory` is stateful just to accommodate
>>> >`createBlankNode(String)`. It's stateless otherwise. This is really an
>>> >issue for banana-rdf as everything is defined as pure function (the
>>> >output only depends on the input).
>>
>> It does not need to be stateful.
>>
>> In simple we implemented this using a final UUID "salt" that is
>> created per instance of the factory. Do you consider this state?
>>
>>
>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/SimpleRDFTermFactory.java#L51
>>
>>
>> This is then used by
>>
>> https://github.com/apache/incubator-commonsrdf/blob/master/simple/src/main/java/org/apache/commons/rdf/simple/BlankNodeImpl.java#L41
>> as part of a hashing to generate the new uniqueReference().  Thus a
>> second call on the same factory with the same name will be hashed with
>> the same salt, and produce the same uniqueReference(), which makes the
>> second BlankNode equal to the first.
>>
>>
>> But you can achieve the contract by other non-stateful means, for
>> instance a random UUID that is static final (and hence no state at all
>> per factory instance), and you can create a uniqueReference() by
>> concatenating that UUID with the System.identityHashCode() of the
>> factory and concatenate the provided name.
>>
>>
>> Also you are not required to implement createBlankNode(String) - you
>> can simply throw UnsupportedOperationException and only support
>> createBlankNode().
>>
>>
>> This should probably be noted in the (yet so far just imagined)
>> Implementors Guide on the Commons RDF website.
>>
>> https://issues.apache.org/jira/browse/COMMONSRDF-24
>
>
> To add experience about createBlankNode(String): this string is not the
> unique reference.
>
> This is a purely practical matter that arises when parsing at scale. The
> external Map<String, BlankNode> approach fails because that Map grows huge.
> N-Triples is worse because it does not have unlabelled blank nodes i.e. [].
>
> When the Map style mapping has provided in the parser, there have been
> reports of parsing taking up too much space, coupled with the fact that
> Java's HashMap isn't very good at going from small to verge large
> (significant internal resizing, and also GC issues).
>
> To avoid that, the factory, having direct access to the BlankNode
> implementation can do better using a "salt+combine with the string"
> approach.  No growing space is needed for the syntax label tracking.
>
> The other design I know of is to expose a factory operation to make a blank
> node from a unique reference.  On balance, people seemed to prefer the
> current design.
>
> It is a practical consideration - not a requirement of RDF.

I agree. I actually do not care if it's a `java.util.Map` or something
else. It's just "something" that gives you a `BlankNode` from a label,
consistently. It could be an actual Map, or something else based on
UUID, it doesn't matter. What matters is that there is an implicit
state, and I argue that this is not a requirement. Passing that
"something" as an argument, and returning the new state along with the
`BlankNode` (whether it's stateful of stateless), would achieve your
requirements the same way.

I actually think that `createBlankNode(String)` *should not* belong to
the factory interface. That is something external to the factory in
its essence, which is there just because it makes things "more
convenient". For example, a parser would have it's own state anyway,
whether it lives *in* the factory or *outside* of it.

Alexandre

>
>         Andy

Reply via email to