Hi all, I work for a JVM vendor, and we're interested in obtaining / creating a set of Lucene benchmarks for internal use. We plan to use these for performance regression testing and general performance analysis (i.e. to make sure Lucene performs well on our JVM). I'm especially interested in benchmarks that demonstrate opportunities for improvements in our JIT compiler.
While I imagine that the lucene/benchmark/ directory is probably the right place to start, I have a few high-level questions that are best answered by people on this mailing list: - Are there realistic Lucene workloads that are bottle-necked on the JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO? If so, what are some examples? - How relevant are the Dacapo "luindex" and "lusearch" benchmarks today? Will porting them to the latest version of Lucene give me a benchmark representative of modern Lucene usage, or has Lucene's performance characteristics evolved in fundamental ways since Dacapo was published? - What is the distribution of Lucene versions in production deployments? Do users tend to aggressively upgrade to the "latest and greatest" Lucene version, or is there usually a non-trivial lag? Any other information that you think is useful or relevant is welcome. Thanks! -- Sanjoy