On 31/12/15 15:07, A. Soroka wrote:
Ah, looks nice. The key seems to me to be to abstract over the
operation that _requires_ reordering. I have the kind of temperament
that tends to ignore the different costs of abstraction in different
places. I’ll take a look at using this style in the txn-in-memory
dataset, in a way that brings clarity without new costs.

On another note, I’m taking a bash at the “lock-per-named-graph"
dataset. Hopefully I’ll have something soonish, that can be run in
harness to see if it really offers useful gains in the use cases for
which I hope it will. If it works, then maybe it would be worth
abstracting to the case of arbitrary partitions of a dataset.

A separate investigation would be a good way to proceed.

Decentralise! No hard dependency on codebase changes - a separate implementation to evolve and test out without the other evolution needs of the codebase to get in the way.

See also Claude's message on locking.

Did you want me to look at this:
https://issues.apache.org/jira/browse/JENA-1084 ?

That would be great.

I was thinking that
I should be able to reuse the current TripleStore/TripleBunch
machinery underneath the TripleTable and QuadTable interfaces, or
possibly just try a very simple ConcurrentHashMap setup.

Personally, I would not use TripleBunch for such a dataset implementation as first choice.

(1) It keeps the Triples around whereas your approach is to keep indexes of the Nodes, ordered, and recreate Quads/Triples on the fly.

For a general framework, and modern CPUs, creating objects seems better nowadays. A good place for an experiment.

(2) "dataset general" is (or should be) in effect using TripleStore/TripleBunch because it autocreates memory graphs. Of course, it has to loop for some GRAPH operations and union default graph needs a set-uniqueness temporary.

A really good thing to learn from JENA-1084 is the cost of the persistent datastructures. Same framework, different index maps gives the most realistic results, using TripleBunch is covered by "dataset general".


One nice part of TripleBunch is the switch from small lists to maps as size grows. (The comments in some places are wrong - they say it switches at 4 but the impl is switch at 9 :-)

Do you think that there is an equivalent idea in TxnMem? My guess is that the answer is "no" because the base maps aren't hash maps which is the space overhead cost that small lists is trying to avoid.

Digressing ...

In terms of simplification, an eye on swapping (in the far future) to graphs being a special datasets (with transactions!) i.e. just the default graph is an interesting possibility to create a smaller codebase.

That's my current best guess for unifying transactions but there are various precursors.

But the immediate thing I'm getting round to is finishing the TxnMem work yet - Fuseki integration is missing and something that needs user testing well before a release.

And some time on the Fuseki caching.

Random thought: Fuseki is the way to test various impls - we could build a kit (low threshold to use) and ask people to report figures for different environments.

        Andy


--- A. Soroka The University of Virginia Library

On Dec 31, 2015, at 7:23 AM, Andy Seaborne <[email protected]>
wrote:

Adam,

I had a go at mapping arguments, all driven by one single
TupleMap.

https://gist.github.com/afs/73f3b118726f1625cb33

Andy


Reply via email to