[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12449117 ] Doron Cohen commented on LUCENE-675: ------------------------------------
I looked at extending the benchmark with: - different test "scenarios", i.e. other sequences of operations. - multithreaded tests, e.g. several queries in parallel. - rate of events, e.g. "2 queries arriving per second", or "one query per second in parallel with 20 new documents in a minute". - different data sources (input documents, queries). For this I made lots of changes to the benchmark code, using parts of it and rewriting other parts. I would like to submit this code in a few days - it is running already but some functionality is missing. I would like to describe how it works to hopefully get early feedback. There are several "basic tasks" defined - all extending an (abstract) class PerfTask: - AddDocTask - OptimizeTask - CreateIndexTask etc. To further extend the benchmark 'framework', new tasks can be added. Each task must implement the abstract method: doLogic(). For instance, in AddDocTask this method (doLogic) would call indexWriter.addDocument(). There are also setup() and tearDown() methods for performing work that should not be timed for that task. A special TaskSequence task contains other tasks. It is either parallel or sequential, which tells if it executes its child tasks serially or in parallel. TaskSequence also supports "rate": the pace in which its child tasks are "fired" can be controlled. With these tasks, it is possible to describe a performance test 'algorithm' in a simple syntax. ('algorithm' may be too big a word for this...?) A test invocation takes two parameters: - test.properties - file with various config properties. - test.alg - file with the algorithm. By convention, for each task class "OpNameTask", the command "OpName" is valid in test.alg. Adding a single document is done by: AddDoc Adding 3 documents: AddDoc AddDoc AddDoc Or, alternatively: { AddDoc } : 3 So, '{' and '}' indicate a serial sequence of (child) tasks. To fire 100 queries in a row: { Search } : 100 To fire 100 queries in parallel: [ Search ] : 100 So, '[' and ']' indicate a parallel group of tasks. To fire 100 queries in a row, 2 queries per second (120 per minute): { Search } : 100 : 120 Similar, but in parallel: [ Search ] : 100 : 120 A sequence task can be named for identifying it in reports: { "QueriesA" Search } : 100 : 120 And there are tasks that create reports. There are more tasks, and more to tell on the alg syntax, but this post is already long.. I find this quite powerful for perf testing. What do you (and you) think? - Doron > Lucene benchmark: objective performance test for Lucene > ------------------------------------------------------- > > Key: LUCENE-675 > URL: http://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll > Attachments: benchmark.patch, BenchmarkingIndexer.pm, > extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java, timedata.zip > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]