[ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436949 ] 
            
Grant Ingersoll commented on LUCENE-675:
----------------------------------------

My comments are marked by GSI
-----------

In the mean time I've been using Europarl for my testing.

GSI: perhaps you can contribute once this is setup

Also important to realize is there are many dimensions to test. With
lock-less I'm focusing entirely on "wall clock time to open readers
and writers" in different use cases like pure indexing, pure
searching, highly interactive mixed indexing/searching, etc. And this
is actually hard to test cleanly because in certain cases (highly
interactive case, or many readers case), the current Lucene hits many
"commit lock" retries and/or timeouts (whereas lock-less doesn't). So
what's a "fair" comparison in this case?

GSI:  I am planning on taking Andrzej contribution and refactoring it into 
components that can be reused, as well as creating a "standard" benchmark which 
will be easy to run through a simple ant task, i.e. ant run-baseline

GSI: From here, anybody can contribute their own (I will provide interfaces to 
facilitate this) benchmarks which others can choose to run. 


In addition to standardizing on the corpus I think we ideallly need
standardized hardware / OS / software configuration as well, so the
numbers are easily comparable across time. 

GSI: Not really feasible unless you are proposing to buy us machines :-)  I 
think more important is the ability to do a before and after evaluation (that 
runs each test several times) as you make changes.  Anybody should be able to 
do the same.  Run the benchmark, apply the patch and then rerun the benchmark.


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: LuceneBenchmark.java
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to