Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

A. Soroka Thu, 24 Sep 2015 10:47:39 -0700

Thanks for the feedback, Andy! In order of your points:

1) Okay, I will get on those transaction-related problems pronto. None of them 
seem like a big deal to fix, and you’ve already given me the main point (finer 
lifecycle granularity) and pointers to the Mantis material.


2) Yeah, I can see that. I’ll switch it to QuadTable for now, pending further 
suggestions. You know, I think I actually called that “QuadIndex” in an earlier 
version!

3) It didn’t even occur to me to manage prefixes (prefixen?). I didn’t realize 
that was a Dataset responsibility, although that certainly makes sense. Well, 
that’s why we don’t make assumptions! {grin} I’ll fix that based on 
DatasetPrefixStorage.

4) That sounds great. First step: I’ll get these problems squared away and 
hopefully other people will glance over and find other things to fix.

Thanks again!

---
A. Soroka
The University of Virginia Library

> On Sep 24, 2015, at 1:05 PM, Andy Seaborne <[email protected]> wrote:
> 
> On 23/09/15 17:44, A. Soroka wrote:
>> Following up on this conversation, I have now have a branch available
>> at:
>> 
>> https://github.com/ajs6f/jena/tree/jena-624
>> 
>> with a six-way-map-based version of this, advancing from (but not
>> directly using) the journaling branch already discussed. (Of course I
>> can separate these if so desired.)
>> 
>> There is an Index type which hopefully is a start towards abstracting
>> out index behavior, as Paul Houle suggested. There have already been
>> interesting suggestions made by Andy and Claude about possible
>> implementations that are more sophisticated than my simple approach,
>> so I just hope that this branch will get the ball rolling.
>> 
>> Comments/advice/criticism eagerly desired!
>> 
>> --- A. Soroka The University of Virginia Library
> 
> Hi - I pulled the code down and had a partial look.
> 
> And it looks very good.
> 
> (As you probably know by now, having me (test/demo/use/) anywhere near new 
> code is very risky but I broke very little ... including TDB.)
> 
> 0/ Basic read/write of files into the dataset worked as expected. :-)
> 
> 1/
> I tried with existing test harnesses:
> (see below for code)
> 
> AbstractDatasetGraphTests
>  Green line!
> 
> AbstractTestTransaction
>  Red line.
> 
> This has tests for various error conditions.
> 
> The begin-begin cases: your code throws JenaException.  There is a 
> sub-exception JenaTransactionException and that what the tests look for.
> 
> The rest are testing that e.g. begin-abort-commit is an error.  It looks like 
> the transaction lifecycle is not being tracked.  It needs finer granularity 
> than DatasetGraphInMemory.isInTransaction.  May be as simple as NOT_TXN -> 
> ACTIVE_TXN -> FINISHING_TXN (-> NOT_TXN). Just in/out isn't enough for, e.g. 
> begin-commit-add_quad->end , begin-commit-commit.  I managed to add after 
> commit.
> 
> There are some presumable related things here: A writer that .end() without 
> .commit or .abort doesn't indicate an error. It probably should. And be added 
> to the AbstractTestTransaction tests.
> 
> And plain commit (no begin) isn't caught as wrong.
> 
>  Dataset ds = DatasetFactory.create(new DatasetGraphInMemory()) ;
>  ds.commit() ;
> 
> Ditto .abort.
> 
> Stray .end()s are probably reasonable - "when in doubt call end()" - so 
> multiple ends() on a transaction, which in effect is end on a 
> non-transaction, is good to allow.
> 
> (I'll add some tests to AbstractTestTransaction for these cases)
> 
> 2/
> I found the use of the name "Index" a odd.  Usually an index (in database 
> speak) is a specific lookup pattern.  S->P->O->G and needing a prefix of that 
> for partials.
> 
> Hence, in SQL and NoSQL, having multiple indexes per table, one primary, and 
> 0-N secondary.  The use in your code is more like the whole "table" (in 
> individual components are all covering solike all RDF subsystems no need to 
> have a distinguished "table"
> 
> org.apache.jena.tdb.index.Index
> org.apache.jena.tdb.index.RangeIndex
> 
> Is this really something like "QuadTable"? "QuadStorage"?
> 
> I am encountering a similar split between the storage and provision of the 
> interface in TDB2.  There, I want to be able to swap the storage on the fly 
> to give parallel storage a compaction option to a running database.  Being 
> on-disk, there isn't a GC to manage the multi-version datastructures.
> 
> 3/
> No prefixes on the dataset?  I only got them to work with getDefaultModel etc.
> 
> TDB uses DatasetPrefixStorage for managing prefixes and then GraphTDB
> Some of DatasetPrefixesTDB could be extracted for common use.
> 
> leading to...
> 4/
> We should look for commonality between TDB and InMem and pull out a separate 
> framework.  That's long term - getting something working and released should 
> not have "architectural internal reorg" on the critical path.
> 
>    Andy
> 
> 
> public class TestDatasetGraphInMem_AFS
>             extends AbstractDatasetGraphTests {
>    @Override
>    protected DatasetGraph emptyDataset() {
>        return new DatasetGraphInMemory() ;
>    }
> }
> 
> public class TestDatasetGraphTxn_AFS
>             extends AbstractTestTransaction {
>    @Override
>    protected Dataset create() {
>        return DatasetFactory.create(new DatasetGraphInMemory()) ;
>    }
> }
> 
> ----------------------
> 
> And if all that begin-commit stuff is boring code ... WIP ...
> 
> https://github.com/afs/mantis/blob/master/dboe-transaction/src/main/java/org/seaborne/dboe/transaction/Txn.java
> 
>    Txn.executeRead(dsg, ()->{
>    ... query ...
>        }) ;
> 
> 
> This includes ThreadTxn - starting a transaction on another thread and 
> executing the body at some later date.  Great for isolation testing.
> 
> This "Transactional" is a different interface to Jena's (slight change - 
> backwards compatible) but the code should work for Jena.

Re: JENA-624: "Develop a new in-memory RDF Dataset implementation"

Reply via email to