[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

Doron Cohen (JIRA) Sun, 12 Nov 2006 02:03:01 -0800

    [ 
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12449117 ] 
            
Doron Cohen commented on LUCENE-675:
------------------------------------


I looked at extending the benchmark with:
- different test "scenarios", i.e. other sequences of operations.
- multithreaded tests, e.g. several queries in parallel.
- rate of events, e.g. "2 queries arriving per second", or "one query per 
second in parallel with 20 new documents in a minute".
- different data sources (input documents, queries).

For this I made lots of changes to the benchmark code, using parts of it and 
rewriting other parts. 
I would like to submit this code in a few days - it is running already but some 
functionality is missing.

I would like to describe how it works to hopefully get early feedback. 

There are several "basic tasks" defined - all extending an (abstract) class 
PerfTask:
- AddDocTask
- OptimizeTask
- CreateIndexTask
etc. 

To further extend the benchmark 'framework', new tasks can be added. Each task 
must implement the abstract method: doLogic(). For instance, in AddDocTask this 
method (doLogic) would call indexWriter.addDocument().
There are also setup() and tearDown() methods for performing work that should 
not be timed for that task. 

A special TaskSequence task contains other tasks. It is either parallel or 
sequential, which tells if it executes its child tasks serially or in parallel. 
TaskSequence also supports "rate": the pace in which its child tasks are 
"fired" can be controlled.

With these tasks, it is possible to describe a performance test 'algorithm' in 
a simple syntax.
('algorithm' may be too big a word for this...?)

A test invocation takes two parameters: 
- test.properties - file with various config properties.
- test.alg               - file with the algorithm.

By convention, for each task class  "OpNameTask",  the command  "OpName"  is 
valid in test.alg.

Adding a single document is done by:
    AddDoc

Adding 3 documents:
   AddDoc
   AddDoc
   AddDoc

Or, alternatively:
   { AddDoc } : 3

So, '{' and '}' indicate a serial sequence of (child) tasks. 

To fire 100 queries in a row:
  { Search } : 100

To fire 100 queries in parallel:
  [ Search ] : 100

So, '[' and ']' indicate a parallel group of tasks. 

To fire 100 queries in a row, 2 queries per second (120 per minute):
  { Search } : 100 : 120

Similar, but in parallel:
  [ Search ] : 100 : 120

A sequence task can be named for identifying it in reports:
  { "QueriesA" Search } : 100 : 120

And there are tasks that create reports. 

There are more tasks, and more to tell on the alg syntax, but this post is 
already long..

I find this quite powerful for perf testing.
What do you (and you) think?

- Doron


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: http://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.patch, BenchmarkingIndexer.pm, 
> extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, timedata.zip
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

Reply via email to