[ https://issues.apache.org/jira/browse/LUCENE-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-790: ------------------------------- Lucene Fields: [Patch Available] (was: [New]) > contrib/benchmark - few improvements and a bug fix > -------------------------------------------------- > > Key: LUCENE-790 > URL: https://issues.apache.org/jira/browse/LUCENE-790 > Project: Lucene - Java > Issue Type: Improvement > Components: Other > Affects Versions: 2.1 > Reporter: Doron Cohen > Assigned To: Doron Cohen > Priority: Minor > Fix For: 2.1 > > Attachments: TrecDocMaker.patch > > > Benchmark byTask was slightly improved: > 1. fixed a bug in the "child-should-not-report" mechanism. If a task sequence > contained only simple tasks it worked as expected (i.e. child tasks did not > report times/memory) but if a child was a task sequence, then its children > would report - they should not - this was fixed, so this property is now > "penetrating/inherited" all the way down. > 2. doc size control now possible also for the Reuters doc maker. (allowing to > index N docs of size C characters each.) > 3. TrecDocMaker was added - it reads as input the .gz files used in Trec - > e.g. .gov data - this can be handy to benchmark Lucene on these large > collections. Similar to the Reuters collection, the doc-maker scans the > input directory for all the files and extracts documents from the files. > Here there are multiple documents in each input file. Unlike the Reuters > collection, we cannot provide a 'loader' for these collections - they are > available from http://trec.nist.gov - for research purposes. > 4. a new BasicDocMaker abstract class handles most of doc-maker tasks, > including creating docs with specific size, so adding new doc-makers for > other data is now much simpler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]