Good morning Christian Thank you for the quick reply! I am indeed surprised that BaseX does not do much particular caching. Now that I think of it, it does seem to make sense: if results are loaded in memory, they will be accessible much faster for consequent queries, and they will reside in memory until overwritten or wiped - or at least that is how I see it, I am no computer expert!
I have now gathered all XPath structures that I would like to benchmark (~100; I'm not sure if this is enough?). Considering I am no hero in XQuery, I will ask my supervisor if he can write a script for this purpose (he loves Perl, so I assume he'll come up with something). I did read on your website that it is possible to communicate with BaseX from Java. Is there any documentation or guidelines on this? I am knowledgeable with Java, so I assume I should be able to conjure up a benchmark script in Java. The only thing that I don't know is how to contact the database and insert a query. Could you lead me to a tutorial-like source, if available? If not I will ask my supervisor's help. Finally I'd like to thank you for the tips for benchmarking, they are very useful! Kind regards Bram https://be.linkedin.com/in/bramvanroy ________________________________________ Van: Christian Grün [christian.gr...@gmail.com] Verzonden: maandag 15 februari 2016 13:26 Aan: Bram Vanroy CC: BaseX Onderwerp: Re: [basex-talk] Benchmarking and caching in BaseX Hi Bram, Thanks for the summary on your work on Treebank and BaseX! > The problem that I have encountered is that BaseX seems to > cache very efficiently. Obviously this is not a problem on production > websites but for benchmarking it may not be ideal. My first question to you, > then, is: is it possible to disable caching when testing queries locally? > And how exactly does BaseX handle the caching? Or more specifically, if I > enter a query: what is cached, and for how long? This information me be > useful to analyse our logs with. You may be surprised to hear that BaseX does not have any particular caching strategies for queries and query results. Various optimizations exist for caching IO data on a lower level, though. As these strategies reach down to the OS and hardware disk access level, it’s hardly possible to disable all of them. Usually, it’s simply your main memory that distorts your performance measurements, because the relevant disk data will only be pulled once from disk as long as enough main memory is available. Besides that, Java programs are generally getting faster and faster the longer they are running (due to Just-in-Time Compilation – JIT)… and so on. In practice, if you do benchmarking, it’s usually good to “warm up” your BaseX instance by running various initial queries, and by using the client/server architecture and e.g. look at the execution time output by the -v or -V command-line flag. In order to simulate real-life query patterns, you should run your test queries in random order, and run a great number of different queries. Moreover, it’s recommendable to run your queries multiple times and eventually take the mean or minimum value as result. If this value differs more than 5% when repeating the test, then you should possibly increase the number of runs. I hope this helps a bit; I invite you to report back on your experiences, Christian