Robert Muir wrote:
> On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das
> <[email protected]>  wrote:
>> Hi all,
>>
>> I work for a JVM vendor, and we're interested in obtaining / creating
>> a set of Lucene benchmarks for internal use.  We plan to use these for
>> performance regression testing and general performance analysis
>> (i.e. to make sure Lucene performs well on our JVM).  I'm especially
>> interested in benchmarks that demonstrate opportunities for
>> improvements in our JIT compiler.
>>
>> While I imagine that the lucene/benchmark/ directory is probably the
>> right place to start, I have a few high-level questions that are best
>> answered by people on this mailing list:
>
> Actually I think http://people.apache.org/~mikemccand/lucenebench/
> might be better for your purposes. Code is currently located here:
> https://github.com/mikemccand/luceneutil

I just replied to Mike about this -- ideally the benchmarks I'm
looking for should run relatively quickly (i.e. < 30 min).

However, if the lucenebench is the right thing to run, I'd rather have
a good benchmark that takes a while to finish over a misleading
benchmark that runs quickly. :)

>> - Are there realistic Lucene workloads that are bottle-necked on the
>>    JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
>>    If so, what are some examples?
>
> You can see some changes in query graphs when the JVM was upgraded at
> the above link. In some cases they are not positive. For example, why
> did indexing throughput drop significantly when upgrading from
> 1.8.0_25 to 1.8.0_40? (annotation BD in
> http://people.apache.org/~mikemccand/lucenebench/indexing.html)

I don't work on OpenJDK, so I cannot comment on OpenJDK's performance;
but that is an interesting data point nevertheless.  It certainly
shows that improving the JVM can help, and vice versa.

>> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks
>>    today?  Will porting them to the latest version of Lucene give me a
>>    benchmark representative of modern Lucene usage, or has Lucene's
>>    performance characteristics evolved in fundamental ways since Dacapo
>>    was published?
>
> Some things have changed since lucene 2.4 such as much better
> concurrency when indexing with multiple threads, the use of bulk
> integer decompression methods vs vByte compression, and so on. Also
> support for new data structures like column-stride fields were added,
> and the use cases around those (e.g. faceted search) are probably not
> represented.

Thanks, that is very useful to know.

-- Sanjoy

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to