Robert Muir wrote: > On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das > <[email protected]> wrote: >> Hi all, >> >> I work for a JVM vendor, and we're interested in obtaining / creating >> a set of Lucene benchmarks for internal use. We plan to use these for >> performance regression testing and general performance analysis >> (i.e. to make sure Lucene performs well on our JVM). I'm especially >> interested in benchmarks that demonstrate opportunities for >> improvements in our JIT compiler. >> >> While I imagine that the lucene/benchmark/ directory is probably the >> right place to start, I have a few high-level questions that are best >> answered by people on this mailing list: > > Actually I think http://people.apache.org/~mikemccand/lucenebench/ > might be better for your purposes. Code is currently located here: > https://github.com/mikemccand/luceneutil
I just replied to Mike about this -- ideally the benchmarks I'm looking for should run relatively quickly (i.e. < 30 min). However, if the lucenebench is the right thing to run, I'd rather have a good benchmark that takes a while to finish over a misleading benchmark that runs quickly. :) >> - Are there realistic Lucene workloads that are bottle-necked on the >> JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO? >> If so, what are some examples? > > You can see some changes in query graphs when the JVM was upgraded at > the above link. In some cases they are not positive. For example, why > did indexing throughput drop significantly when upgrading from > 1.8.0_25 to 1.8.0_40? (annotation BD in > http://people.apache.org/~mikemccand/lucenebench/indexing.html) I don't work on OpenJDK, so I cannot comment on OpenJDK's performance; but that is an interesting data point nevertheless. It certainly shows that improving the JVM can help, and vice versa. >> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks >> today? Will porting them to the latest version of Lucene give me a >> benchmark representative of modern Lucene usage, or has Lucene's >> performance characteristics evolved in fundamental ways since Dacapo >> was published? > > Some things have changed since lucene 2.4 such as much better > concurrency when indexing with multiple threads, the use of bulk > integer decompression methods vs vByte compression, and so on. Also > support for new data structures like column-stride fields were added, > and the use cases around those (e.g. faceted search) are probably not > represented. Thanks, that is very useful to know. -- Sanjoy > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
