[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436949 ] Grant Ingersoll commented on LUCENE-675: ----------------------------------------
My comments are marked by GSI ----------- In the mean time I've been using Europarl for my testing. GSI: perhaps you can contribute once this is setup Also important to realize is there are many dimensions to test. With lock-less I'm focusing entirely on "wall clock time to open readers and writers" in different use cases like pure indexing, pure searching, highly interactive mixed indexing/searching, etc. And this is actually hard to test cleanly because in certain cases (highly interactive case, or many readers case), the current Lucene hits many "commit lock" retries and/or timeouts (whereas lock-less doesn't). So what's a "fair" comparison in this case? GSI: I am planning on taking Andrzej contribution and refactoring it into components that can be reused, as well as creating a "standard" benchmark which will be easy to run through a simple ant task, i.e. ant run-baseline GSI: From here, anybody can contribute their own (I will provide interfaces to facilitate this) benchmarks which others can choose to run. In addition to standardizing on the corpus I think we ideallly need standardized hardware / OS / software configuration as well, so the numbers are easily comparable across time. GSI: Not really feasible unless you are proposing to buy us machines :-) I think more important is the ability to do a before and after evaluation (that runs each test several times) as you make changes. Anybody should be able to do the same. Run the benchmark, apply the patch and then rerun the benchmark. > Lucene benchmark: objective performance test for Lucene > ------------------------------------------------------- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll > Attachments: LuceneBenchmark.java > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]