Re: Timing tests for jena-624: doing better

Andy Seaborne Sun, 27 Sep 2015 02:42:07 -0700

I can't try out your new stuff for a few days due to not being near a
suitable computer.


On 26/09/15 18:31, A. Soroka wrote:

On a related note, are there any Jena standard parts for query
testing for this kind of situation? I know that BSBM has several
sophisticated suites of tests defined, but are any of them considered
particularly appropriate, or has anyone out there in dev-land built
their own harness for BSBM or something else that I could “borrow”?
{grin}

Benchmarks like BSBM are looking at scale in a way that is different.BSBM is as much about the mem-storage boundary.

For the general purpose in-memory dataset, the need is for some lowerlevel tests mainly to ensure nothing really bad, and easily addressableis happening.

SPARQL execution is only lightly going to be influenced by datasetspeed. Complex queries do a lot of intermediate processing (e.g.sorting) and that's not to do with the base data. One exception (isn'tthere always) is property paths. The current implementation can hit thestore at fine grain quite hard; the ideal is better algorithms forproperty paths but it also presents what code that directly uses the APImight do.

In TDB, it would be better to computer in NodeIds but the currentintegration gets the Nodes IIRC. [Hmm - there is a fairly obvious wayto fix that ... different discussion.]


A few simple tests that come to mind are:

1. count all triples - test end to end scan of the dataset
2. write the whole dataset to /dev/null.
3. same as above but for a graph, default or named.

4. Some find() cases that are more important like find(G,S,?,?)find(G,?,P,O) [key look up] or find(G,?,P,?)

  find(G,?,?,?) is covered by (3)

5. and the non-G versions for a graph.
*6. Union graph (if supported)

Given those, I think the next level of verification is real use, ratherthan specific (artificial) situations. Of course, there is alsomega-sized in-memory use cases (systems can deploy at lot of RAM thesedays). Then GC and/or off heap memory starts getting fun.


        Andy

Re: Timing tests for jena-624: doing better

Reply via email to