On 23/09/15 17:44, A. Soroka wrote:
Following up on this conversation, I have now have a branch available
at:
https://github.com/ajs6f/jena/tree/jena-624
with a six-way-map-based version of this, advancing from (but not
directly using) the journaling branch already discussed. (Of course I
can separate these if so desired.)
There is an Index type which hopefully is a start towards abstracting
out index behavior, as Paul Houle suggested. There have already been
interesting suggestions made by Andy and Claude about possible
implementations that are more sophisticated than my simple approach,
so I just hope that this branch will get the ball rolling.
Comments/advice/criticism eagerly desired!
--- A. Soroka The University of Virginia Library
Hi - I pulled the code down and had a partial look.
And it looks very good.
(As you probably know by now, having me (test/demo/use/) anywhere near
new code is very risky but I broke very little ... including TDB.)
0/ Basic read/write of files into the dataset worked as expected. :-)
1/
I tried with existing test harnesses:
(see below for code)
AbstractDatasetGraphTests
Green line!
AbstractTestTransaction
Red line.
This has tests for various error conditions.
The begin-begin cases: your code throws JenaException. There is a
sub-exception JenaTransactionException and that what the tests look for.
The rest are testing that e.g. begin-abort-commit is an error. It looks
like the transaction lifecycle is not being tracked. It needs finer
granularity than DatasetGraphInMemory.isInTransaction. May be as simple
as NOT_TXN -> ACTIVE_TXN -> FINISHING_TXN (-> NOT_TXN). Just in/out
isn't enough for, e.g. begin-commit-add_quad->end , begin-commit-commit.
I managed to add after commit.
There are some presumable related things here: A writer that .end()
without .commit or .abort doesn't indicate an error. It probably should.
And be added to the AbstractTestTransaction tests.
And plain commit (no begin) isn't caught as wrong.
Dataset ds = DatasetFactory.create(new DatasetGraphInMemory()) ;
ds.commit() ;
Ditto .abort.
Stray .end()s are probably reasonable - "when in doubt call end()" - so
multiple ends() on a transaction, which in effect is end on a
non-transaction, is good to allow.
(I'll add some tests to AbstractTestTransaction for these cases)
2/
I found the use of the name "Index" a odd. Usually an index (in
database speak) is a specific lookup pattern. S->P->O->G and needing a
prefix of that for partials.
Hence, in SQL and NoSQL, having multiple indexes per table, one primary,
and 0-N secondary. The use in your code is more like the whole "table"
(in individual components are all covering solike all RDF subsystems no
need to have a distinguished "table"
org.apache.jena.tdb.index.Index
org.apache.jena.tdb.index.RangeIndex
Is this really something like "QuadTable"? "QuadStorage"?
I am encountering a similar split between the storage and provision of
the interface in TDB2. There, I want to be able to swap the storage on
the fly to give parallel storage a compaction option to a running
database. Being on-disk, there isn't a GC to manage the multi-version
datastructures.
3/
No prefixes on the dataset? I only got them to work with
getDefaultModel etc.
TDB uses DatasetPrefixStorage for managing prefixes and then GraphTDB
Some of DatasetPrefixesTDB could be extracted for common use.
leading to...
4/
We should look for commonality between TDB and InMem and pull out a
separate framework. That's long term - getting something working and
released should not have "architectural internal reorg" on the critical
path.
Andy
public class TestDatasetGraphInMem_AFS
extends AbstractDatasetGraphTests {
@Override
protected DatasetGraph emptyDataset() {
return new DatasetGraphInMemory() ;
}
}
public class TestDatasetGraphTxn_AFS
extends AbstractTestTransaction {
@Override
protected Dataset create() {
return DatasetFactory.create(new DatasetGraphInMemory()) ;
}
}
----------------------
And if all that begin-commit stuff is boring code ... WIP ...
https://github.com/afs/mantis/blob/master/dboe-transaction/src/main/java/org/seaborne/dboe/transaction/Txn.java
Txn.executeRead(dsg, ()->{
... query ...
}) ;
This includes ThreadTxn - starting a transaction on another thread and
executing the body at some later date. Great for isolation testing.
This "Transactional" is a different interface to Jena's (slight change -
backwards compatible) but the code should work for Jena.