Re: What's the optimal ways to measure Lucene query cost?

Michael McCandless Fri, 27 May 2016 02:07:50 -0700

Usually query latency is what "matters" to the end user so I would measure
that.

However, it is hard to measure ;)

You can measure it on an idle system, that is just running your query with
one thread, or a loaded nearly to the red-line system, where multiple
threads are being use to saturate IO or CPU resources.

Be sure to discard "warmup", when the JVM is still compiling things.  Be
sure to take many measurements, both within one JVM (since e.g. sudden GC
can impact a query), but then across JVMs (since hotspot can sometimes
compile things very differently, apparently).

If your index won't be hot (the "working set" can't comfortable fit into
the free RAM to the OS) in production then be sure you test that way too,
so you are in fact measuring the cost of Lucene having to pull postings,
doc values, etc., from disk.

Our nightly benchmarks (http://home.apache.org/~mikemccand/lucenebench/)
run multiple query types concurrently across threads, across 20 JVM
instances, with many iterations per JVM instance, discard warmup
iterations, and then take the median query latency to add to the charts.

Mike McCandless

http://blog.mikemccandless.com

On Thu, May 26, 2016 at 9:35 PM, Thomas Pan <[email protected]>
wrote:

>
> I am curious as how to measure Lucene query cost. Shall I use query
> latency or shall I dig into deeper as how many postings are touched and how
> many fields are returned, etc.?
>
>
> Best,
> Thomas
>
> --
> The journey of a thousand miles begins with one step. -- Lao Tzu
> Do not go where the path may lead, go instead where there is no path and
> leave a trail. -- Ralph Waldo Emerson
>

Re: What's the optimal ways to measure Lucene query cost?

Reply via email to