I can't try out your new stuff for a few days due to not being near a
suitable computer.
On 26/09/15 18:31, A. Soroka wrote:
On a related note, are there any Jena standard parts for query
testing for this kind of situation? I know that BSBM has several
sophisticated suites of tests defined, but are any of them considered
particularly appropriate, or has anyone out there in dev-land built
their own harness for BSBM or something else that I could “borrow”?
{grin}
Benchmarks like BSBM are looking at scale in a way that is different.
BSBM is as much about the mem-storage boundary.
For the general purpose in-memory dataset, the need is for some lower
level tests mainly to ensure nothing really bad, and easily addressable
is happening.
SPARQL execution is only lightly going to be influenced by dataset
speed. Complex queries do a lot of intermediate processing (e.g.
sorting) and that's not to do with the base data. One exception (isn't
there always) is property paths. The current implementation can hit the
store at fine grain quite hard; the ideal is better algorithms for
property paths but it also presents what code that directly uses the API
might do.
In TDB, it would be better to computer in NodeIds but the current
integration gets the Nodes IIRC. [Hmm - there is a fairly obvious way
to fix that ... different discussion.]
A few simple tests that come to mind are:
1. count all triples - test end to end scan of the dataset
2. write the whole dataset to /dev/null.
3. same as above but for a graph, default or named.
4. Some find() cases that are more important like find(G,S,?,?)
find(G,?,P,O) [key look up] or find(G,?,P,?)
find(G,?,?,?) is covered by (3)
5. and the non-G versions for a graph.
*6. Union graph (if supported)
Given those, I think the next level of verification is real use, rather
than specific (artificial) situations. Of course, there is also
mega-sized in-memory use cases (systems can deploy at lot of RAM these
days). Then GC and/or off heap memory starts getting fun.
Andy