On 06/03/12 09:14, Glenn Proctor wrote:
Hi
I have a TDB instance (0.8.10) containing about 207m triples. I've run
tdbstats and moved stats.opt into the appropriate place.
I've noticed that running the same query multiple times in succession
results in successively shorter query times, up to a point. For
example, on an otherwise-idle TDB instance, the query
SELECT ?facet ?val (COUNT(?val) as ?vc) WHERE { ?id a ?val . ?id
?facet ?val . } GROUP BY ?facet ?val ORDER BY DESC(?vc) LIMIT 25
Takes 3707s, then 1424s, then 345s where it seems to stay for subsequent runs.
What's the reason for this initial improvement and subsequent tailing
off - are the indexes being optimised with every query?
Glenn.
Glenn,
Nothing so clever I'm afraid. I think what your seeing is the OS
management of memory mapped files.
The first run, if a cold system or if queries that have touched
different parts of indexes, will cause the memory mapped pages to become
mapped and this is also caching index data in memory. The latter runs
benefit from the OS caching. If the intermediate results are large for
the sort, then it's spilling to disk, also with possible OS cache effects.
Andy