Hi, We have been doing such a benchmark over all TREC collections and TREC queries. Our participation to TREC in last years gives us the opportunity to work on the collections. Lucene is one of the systems that we look at. The measurements are based on two functionalities; indexing and querying. We are using several settings for each system and measuring the performancess. It is not finished yet. Given our available resources, it seems it will take a while. I will release the results as soon as the tests are finished.
Cheers, Murat > Dear Lucene developers, > I'd be interested in doing some benchmarking on (at least) Lucene, > Egothor and MG4J. There is no actual data around on publicly available > collections, and it would be nice to have some more objective data on > efficiency for a significantly large collection. > > We have GOV2 (25M documents), which is publicly available but must be > bought. We can use it to do the benchmarks, but we will certainly need > some help to configure Lucene so that it works at its best. We have some > reasonably large server that we can allocate to that purpose. > > My idea would be to start compression from a text file (one document per > line), so that decompression (GOV2 is in zipped files) and parsing (most > docs are HTML) does not come into play. > > We would like to measure indexing time and query answer time--people > from different engines could suggest different queries so that each > engine gets the highlight on its best features. I'd start with pure > Boolean queries in which documents must be returned in index order, so > that the results are the same. In a second phase we can try to compare > the results with ranked queries (which however is going to be more > complicated, and I do not want to duplicate TREC). > > Please let me know if you're interested in the project! > > Ciao, > > seba > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]