Sorry for spamming the list a bit today, but before COB I wanted to offer some more figures on this effort. Using a port of Scala’s immutable collections [*] in a new branch [**] the new implementation is now seeing a little better than half the load performance of the “stock” impl (see below sig). Of course these figures are very rough, but hopefully they demonstrate motion in the right direction. I still intend to try out Clojure’s collections, but I think I’m a lot closer to a realistic level of performance. I hope to demonstrate something about the query performance here soon.
[*] https://github.com/andrewoma/dexx [**] https://github.com/ajs6f/jena/tree/jena-624-dexx Anyone who is interested in examining these branches should be aware that they are currently moving targets— commits several times a day. --- A. Soroka The University of Virginia Library Running org.apache.jena.sparql.core.mem.PerfTest ==== Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ==== Size: 1,000,312 (2.978s, 335,900 tps) ==== DSG/mix/auto (warm N=3) ==== DSG/mix/txn (warm N=3) ==== DSG/mem/auto (warm N=3) ==== DSG/mem/txn (warm N=3) ==== DSG/mix/auto (N=20) ==== DSG/mix/auto (N=20) Time: 97.761s (204,644 tps) ==== DSG/mix/txn (N=20) ==== DSG/mix/txn (N=20) Time: 101.668s (196,780 tps) ==== DSG/mem/auto (N=20) ==== DSG/mem/auto (N=20) Time: 211.971s (94,381 tps) ==== DSG/mem/txn (N=20) ==== DSG/mem/txn (N=20) Time: 151.359s (132,177 tps) > On Sep 26, 2015, at 1:31 PM, A. Soroka <[email protected]> wrote: > > I’ve committed the change to using separate triple and quad indexes (via > DatasetGraphTriplesQuads). There appears to be definite and significant > improvement, from Andy’s numbers showing the current implementation getting 5 > times the load performance of the new implementation to my numbers (below) > which show the new impl improved so that the current impl is at maybe 2.5 > times its performance. Thanks for that advice, Andy! > > I’ll probably take a look next at moving to a more powerful library for > persistent structures that might either perform better raw or offer finer > control over tree creation as discussed above in this thread. > > On a related note, are there any Jena standard parts for query testing for > this kind of situation? I know that BSBM has several sophisticated suites of > tests defined, but are any of them considered particularly appropriate, or > has anyone out there in dev-land built their own harness for BSBM or > something else that I could “borrow”? {grin} > > — > A. Soroka > The University of Virginia Library > > === Data: /Users/ajs6f/Documents/jena/bsbm-1m.nt.gz ==== > Size: 1,000,312 (2.947s, 339,434 tps) > ==== DSG/mix/auto (warm N=3) > ==== DSG/mix/txn (warm N=3) > ==== DSG/mem/auto (warm N=3) > ==== DSG/mem/txn (warm N=3) > ==== DSG/mix/auto (N=20) > ==== DSG/mix/auto (N=20) Time: 108.331s (184,676 tps) > ==== DSG/mix/txn (N=20) > ==== DSG/mix/txn (N=20) Time: 105.424s (189,769 tps) > ==== DSG/mem/auto (N=20) > ==== DSG/mem/auto (N=20) Time: 283.680s (70,523 tps) > ==== DSG/mem/txn (N=20) > ==== DSG/mem/txn (N=20) Time: 224.501s (89,114 tps) > >> On Sep 26, 2015, at 9:21 AM, Andy Seaborne <[email protected]> wrote: >> >> On 26/09/15 12:07, A. Soroka wrote: >>> Ooh! Those numbers are awful. >> >> Early days. The general purpose dataset has no features. And, of course, a >> concurrent read is completely blocked - that's a major issue for some usages. >> >> Access performance, having update not block query, in a very reliable >> implementation is a valuable thing to have. And if it is described as a >> "complete temporal database", it is all a good thing. Marketing. >> >> The storage implementation is now a self-contained thing to look at. ... >> seems there is no shortage of options ... google quickly got me: >> >> http://stackoverflow.com/questions/8575723/whats-a-good-persistent-collections-framework-for-use-in-java >> >> and there are more. Various data structures I have not heard of before. >> >>> Per your point 2, it does create a new >>> tree per add/remove. And PCollections’ bulk operations are just loops >>> over the single-element operations, so trying to accumulate data and >>> use a single operation will create the same number of trees. >>> Unfortunately, PCollections does not have something like Clojure’s >>> transient operations [*], where under carefully-controlled conditions >>> a normally persistent structure can be mutated in place for celerity >>> of operation. I have no commitment to PCollections, and I can switch >>> and see what happens with Clojure and transiency. But I should first >>> go back over the code with a fine-toothed comb and make sure that >>> there isn’t a plain old mistake of some kind. >>> >>> As far as the indexes, I’m not quite sure what you mean by >>> “triples+quads”. Do you mean a single map from graph name to three >>> triple-covering indexes? Something like Map<Node, TripleIndex>, with >>> TripleIndex having within it three covering indexes for triples in >>> the way that current HexIndex has within it six covering indexes for >>> quads? >> >> That's one way - I meant using the supporting framework in >> DatasetGraphTriplesQuads so >> >> DatasetGraphQuads => DatasetGraphTriplesQuads >> >> The default graph is handled separately from named graphs. >> >> TDB uses this - there is a triple table (dft: 3 index) and a quads table >> (dft: 6 index) >> >> Andy >> >>> >>> --- A. Soroka The University of Virginia Library >>> >>> [*] http://clojure.org/transients >>> >>>> On Sep 26, 2015, at 6:42 AM, Andy Seaborne <[email protected]> >>>> wrote: >>>> >>>> Some thoughts: >>>> >>>> 1/ If it were a triples+quads design (TripleTable, QuadTable) , not >>>> just quads, there would be 3 indexes not 6 for triples so 2x >>>> faster. >>>> >>>> 2/ As autocommit and txn forms are nearly the same, I guess that >>>> every add(Quad) is causing a new pcollections tree for each index. >>>> >>>> I don't know pcollections but is it possible to use it so a >>>> independent tree is created only at begin(W). i.e. copy-to-root >>>> does not happen on stuff updated already touched after begin(W). >>>> >>>> Andy >>> >> >
