On 31/12/15 15:07, A. Soroka wrote:
Ah, looks nice. The key seems to me to be to abstract over the
operation that _requires_ reordering. I have the kind of temperament
that tends to ignore the different costs of abstraction in different
places. I’ll take a look at using this style in the txn-in-memory
dataset, in a way that brings clarity without new costs.
On another note, I’m taking a bash at the “lock-per-named-graph"
dataset. Hopefully I’ll have something soonish, that can be run in
harness to see if it really offers useful gains in the use cases for
which I hope it will. If it works, then maybe it would be worth
abstracting to the case of arbitrary partitions of a dataset.
A separate investigation would be a good way to proceed.
Decentralise! No hard dependency on codebase changes - a separate
implementation to evolve and test out without the other evolution needs
of the codebase to get in the way.
See also Claude's message on locking.
Did you want me to look at this:
https://issues.apache.org/jira/browse/JENA-1084 ?
That would be great.
I was thinking that
I should be able to reuse the current TripleStore/TripleBunch
machinery underneath the TripleTable and QuadTable interfaces, or
possibly just try a very simple ConcurrentHashMap setup.
Personally, I would not use TripleBunch for such a dataset
implementation as first choice.
(1) It keeps the Triples around whereas your approach is to keep indexes
of the Nodes, ordered, and recreate Quads/Triples on the fly.
For a general framework, and modern CPUs, creating objects seems better
nowadays. A good place for an experiment.
(2) "dataset general" is (or should be) in effect using
TripleStore/TripleBunch because it autocreates memory graphs. Of
course, it has to loop for some GRAPH operations and union default graph
needs a set-uniqueness temporary.
A really good thing to learn from JENA-1084 is the cost of the
persistent datastructures. Same framework, different index maps gives
the most realistic results, using TripleBunch is covered by "dataset
general".
One nice part of TripleBunch is the switch from small lists to maps as
size grows. (The comments in some places are wrong - they say it
switches at 4 but the impl is switch at 9 :-)
Do you think that there is an equivalent idea in TxnMem? My guess is
that the answer is "no" because the base maps aren't hash maps which is
the space overhead cost that small lists is trying to avoid.
Digressing ...
In terms of simplification, an eye on swapping (in the far future) to
graphs being a special datasets (with transactions!) i.e. just the
default graph is an interesting possibility to create a smaller codebase.
That's my current best guess for unifying transactions but there are
various precursors.
But the immediate thing I'm getting round to is finishing the TxnMem
work yet - Fuseki integration is missing and something that needs user
testing well before a release.
And some time on the Fuseki caching.
Random thought: Fuseki is the way to test various impls - we could build
a kit (low threshold to use) and ask people to report figures for
different environments.
Andy
--- A. Soroka The University of Virginia Library
On Dec 31, 2015, at 7:23 AM, Andy Seaborne <[email protected]>
wrote:
Adam,
I had a go at mapping arguments, all driven by one single
TupleMap.
https://gist.github.com/afs/73f3b118726f1625cb33
Andy