> On Jan 2, 2016, at 1:59 PM, Andy Seaborne <[email protected]> wrote:
> On 31/12/15 15:07, A. Soroka wrote:
>> 
>> On another note, I’m taking a bash at the “lock-per-named-graph” dataset. 
>> Hopefully I’ll have something soonish, that can be run in harness to see if 
>> it really offers useful gains in the use cases for which I hope it will. If 
>> it works, then maybe it would be worth abstracting to the case of arbitrary 
>> partitions of a dataset.
> 
> Decentralise!  No hard dependency on codebase changes - a separate 
> implementation to evolve and test out without the other evolution needs of 
> the codebase to get in the way.

I’m trying to draft a completely independent DatasetGraph implementation, which 
I should think would meet this criterion, although I have run into a similar 
problem as Claude outlined in a recent message: the current Lock, Transactional 
and TransactionalComponent types aren’t really set up to contemplate taking a 
lock or opening a transaction with some particular scope in the data, so new 
types have to be introduced to support that. Not a huge deal, though. I’m 
looking forward to seeing what Claude comes up with, because he is working on a 
more general problem. 

> See also Claude's message on locking.

Yes, the efforts clearly connect. I’m hopeful that “writer-per-graph” turns out 
to be a special case of Claude’s idea (with a lock-region-pattern of <g ANY ANY 
ANY>). I’m also somewhat hopeful that we can analyze some of the problems in 
the general idea by building up from simpler patterns (e.g. lock regions that 
partition the data).

>> Did you want me to look at this: 
>> https://issues.apache.org/jira/browse/JENA-1084 ?
> That would be great.

Happy to: please assign it to me. (I can’t self-assign in Jira.)

>> I was thinking that I should be able to reuse the current 
>> TripleStore/TripleBunch machinery underneath the TripleTable and QuadTable 
>> interfaces, or possibly just try a very simple ConcurrentHashMap setup.
> Personally, I would not use TripleBunch for such a dataset implementation as 
> first choice.
> <snipped>
> A really good thing to learn from JENA-1084 is the cost of the persistent 
> datastructures.  Same framework, different index maps gives the most 
> realistic results, using TripleBunch is covered by "dataset general”.

Okay, I’ll plug in some basic java.util Maps and see what we get!

> One nice part of TripleBunch is the switch from small lists to maps as size 
> grows.  (The comments in some places are wrong - they say it switches at 4 
> but the impl is switch at 9 :-)

https://issues.apache.org/jira/browse/JENA-1109
:)

> Do you think that there is an equivalent idea in TxnMem?  My guess is that 
> the answer is "no" because the base maps aren't hash maps which is the space 
> overhead cost that small lists is trying to avoid.

Well, the library we are now using for persistent data structures 
(https://github.com/andrewoma/dexx) does provide fairly cheap persistent lists, 
but I suspect that the cost at the “breakover” to go to a map might be high. It 
might be worth looking at, though.

There is an intriguing possibility to work on an idea that Clojure provides as 
“transient” data structures. The idea is that within the remit of a given 
thread (really for us, a transaction, but Clojure, for obvious reasons, isn’t 
going to speak in that language about basic data structures) it could be 
possible to mutate-in-place a normally persistent data structure with cost 
savings in time and space. The current design of TxnMem could take advantage of 
this without much effort, but the library doesn’t offer that feature. I did try 
Clojure’s structures, but they weren’t as performant as Dexx in the simple 
tests we did and the use of Clojure libraries from Java is… not pretty 
(maintenance headaches ahead!). However, we might get to the point that 
specialized persistent data structure implementations for Jena would be 
worthwhile, and the “transient” idea should definitely be part of that.

> In terms of simplification, an eye on swapping (in the far future) to graphs 
> being a special datasets (with transactions!) i.e. just the default graph is 
> an interesting possibility to create a smaller codebase. That's my current 
> best guess for unifying transactions but there are various precursors.

I think that sounds great. Maybe we can take a step there by making those Model 
and Graph impls that wrap “pieces” of datasets respect the transactionality of 
those datasets.

> But the immediate thing I'm getting round to is finishing the TxnMem work yet 
> - Fuseki integration is missing and something that needs user testing well 
> before a release.

Yes, definitely. I feel like that’s on me to carry through but I haven’t looked 
at Fuseki at all much. Would you like to file a ticket on me for that? Or have 
you already started that?

> Random thought: Fuseki is the way to test various impls - we could build a 
> kit (low threshold to use) and ask people to report figures for different 
> environments.

A “kit" meaning something like a specially-advertised one-off release of the 
Fuseki download that includes the new stuff? That sounds like a great way to 
lower the bar to getting feedback, and a great technique in general to 
advertise new features and build up interest. 

---
A. Soroka
The University of Virginia Library

Reply via email to