Thanks for the feedback, Andy! In order of your points:
1) Okay, I will get on those transaction-related problems pronto. None of them
seem like a big deal to fix, and you’ve already given me the main point (finer
lifecycle granularity) and pointers to the Mantis material.
2) Yeah, I can see that. I’ll switch it to QuadTable for now, pending further
suggestions. You know, I think I actually called that “QuadIndex” in an earlier
version!
3) It didn’t even occur to me to manage prefixes (prefixen?). I didn’t realize
that was a Dataset responsibility, although that certainly makes sense. Well,
that’s why we don’t make assumptions! {grin} I’ll fix that based on
DatasetPrefixStorage.
4) That sounds great. First step: I’ll get these problems squared away and
hopefully other people will glance over and find other things to fix.
Thanks again!
---
A. Soroka
The University of Virginia Library
> On Sep 24, 2015, at 1:05 PM, Andy Seaborne <[email protected]> wrote:
>
> On 23/09/15 17:44, A. Soroka wrote:
>> Following up on this conversation, I have now have a branch available
>> at:
>>
>> https://github.com/ajs6f/jena/tree/jena-624
>>
>> with a six-way-map-based version of this, advancing from (but not
>> directly using) the journaling branch already discussed. (Of course I
>> can separate these if so desired.)
>>
>> There is an Index type which hopefully is a start towards abstracting
>> out index behavior, as Paul Houle suggested. There have already been
>> interesting suggestions made by Andy and Claude about possible
>> implementations that are more sophisticated than my simple approach,
>> so I just hope that this branch will get the ball rolling.
>>
>> Comments/advice/criticism eagerly desired!
>>
>> --- A. Soroka The University of Virginia Library
>
> Hi - I pulled the code down and had a partial look.
>
> And it looks very good.
>
> (As you probably know by now, having me (test/demo/use/) anywhere near new
> code is very risky but I broke very little ... including TDB.)
>
> 0/ Basic read/write of files into the dataset worked as expected. :-)
>
> 1/
> I tried with existing test harnesses:
> (see below for code)
>
> AbstractDatasetGraphTests
> Green line!
>
> AbstractTestTransaction
> Red line.
>
> This has tests for various error conditions.
>
> The begin-begin cases: your code throws JenaException. There is a
> sub-exception JenaTransactionException and that what the tests look for.
>
> The rest are testing that e.g. begin-abort-commit is an error. It looks like
> the transaction lifecycle is not being tracked. It needs finer granularity
> than DatasetGraphInMemory.isInTransaction. May be as simple as NOT_TXN ->
> ACTIVE_TXN -> FINISHING_TXN (-> NOT_TXN). Just in/out isn't enough for, e.g.
> begin-commit-add_quad->end , begin-commit-commit. I managed to add after
> commit.
>
> There are some presumable related things here: A writer that .end() without
> .commit or .abort doesn't indicate an error. It probably should. And be added
> to the AbstractTestTransaction tests.
>
> And plain commit (no begin) isn't caught as wrong.
>
> Dataset ds = DatasetFactory.create(new DatasetGraphInMemory()) ;
> ds.commit() ;
>
> Ditto .abort.
>
> Stray .end()s are probably reasonable - "when in doubt call end()" - so
> multiple ends() on a transaction, which in effect is end on a
> non-transaction, is good to allow.
>
> (I'll add some tests to AbstractTestTransaction for these cases)
>
> 2/
> I found the use of the name "Index" a odd. Usually an index (in database
> speak) is a specific lookup pattern. S->P->O->G and needing a prefix of that
> for partials.
>
> Hence, in SQL and NoSQL, having multiple indexes per table, one primary, and
> 0-N secondary. The use in your code is more like the whole "table" (in
> individual components are all covering solike all RDF subsystems no need to
> have a distinguished "table"
>
> org.apache.jena.tdb.index.Index
> org.apache.jena.tdb.index.RangeIndex
>
> Is this really something like "QuadTable"? "QuadStorage"?
>
> I am encountering a similar split between the storage and provision of the
> interface in TDB2. There, I want to be able to swap the storage on the fly
> to give parallel storage a compaction option to a running database. Being
> on-disk, there isn't a GC to manage the multi-version datastructures.
>
> 3/
> No prefixes on the dataset? I only got them to work with getDefaultModel etc.
>
> TDB uses DatasetPrefixStorage for managing prefixes and then GraphTDB
> Some of DatasetPrefixesTDB could be extracted for common use.
>
> leading to...
> 4/
> We should look for commonality between TDB and InMem and pull out a separate
> framework. That's long term - getting something working and released should
> not have "architectural internal reorg" on the critical path.
>
> Andy
>
>
> public class TestDatasetGraphInMem_AFS
> extends AbstractDatasetGraphTests {
> @Override
> protected DatasetGraph emptyDataset() {
> return new DatasetGraphInMemory() ;
> }
> }
>
> public class TestDatasetGraphTxn_AFS
> extends AbstractTestTransaction {
> @Override
> protected Dataset create() {
> return DatasetFactory.create(new DatasetGraphInMemory()) ;
> }
> }
>
> ----------------------
>
> And if all that begin-commit stuff is boring code ... WIP ...
>
> https://github.com/afs/mantis/blob/master/dboe-transaction/src/main/java/org/seaborne/dboe/transaction/Txn.java
>
> Txn.executeRead(dsg, ()->{
> ... query ...
> }) ;
>
>
> This includes ThreadTxn - starting a transaction on another thread and
> executing the body at some later date. Great for isolation testing.
>
> This "Transactional" is a different interface to Jena's (slight change -
> backwards compatible) but the code should work for Jena.