Mike, I didn't anticipate this use case and I think it would not work correctly. I'll look into this.
Anyhow, I think it would not work as you expect. It seems what you want is to have 4 threads, adding docs in parallel, until the doc maker is exhausted. But this line: {[AddDoc(4000)]: 4} : * Reads as - Repeatedly until exhausted: Create & Start 4 threads (in parallel), each adding 1 doc of size 4000; Wait for them 4 threads to complete. Now, this is not what you are after, is it? I think you would like just 4 threads to do all the work. It seems what you are really after is this: [ { AddDoc } : * ] : 4 This reads as: Create 4 threads, each adding docs until exhaustion. Since there is a single system-benchmark-wide doc-maker, all 4 threads use it, and when it is exhausted, all 4 will be done. I tried this way and it works as I expected it to (except for that DateFormat bug, see below). Can you try like this and let me know if it works for you. I think your variation of this exposes a bug in the benchmark - it will just loop forever because the parallel sequence would mask the exhaustion from the outer sequential sequence. I opened LUCENE-941 for this, and looking into it Doron "Michael McCandless" <[EMAIL PROTECTED]> wrote on 22/06/2007 13:18:10: > > Hi, > > I'm trying to test LUCENE-843 (IndexWriter speedups) on Wikipedia > using the the benchmark contrib framework plus the patch from > LUCENE-848. > > I downloaded an older wikipedia export (the "latest" doesn't seem to > exist) and got it un-tar'd. The test I'd like to run is to use 4 > threads to index all (exhaust) documents. I'm using the alg below. > > One problem I hit is the DirDocMaker uses a SimpleDateFormat instance > for parsing the dates at the top of each file, but, this is not > threadsafe and so I hit exceptions from there. I think we just need > to make that instance thread local I think (I will open issue). Yes, tha's a bug... It is also in some already committed parts of the benchmark. I opened LUCENE-940 for this. > > The question I have is: is this alg going to do what I want? I'd like > each doc in Wikipedia to be indexed only once, with 4 threads running. > I *think* but I'm not sure that the alg below actually indexes the > Wikipedia content 4 times over instead? > > Here's the alg: > > max.field.length=2147483647 > compound=false > > analyzer=org.apache.lucene.analysis.SimpleAnalyzer > directory=FSDirectory > # ram.flush.mb=32 > max.buffered=10000 > doc.stored=true > doc.tokenized=true > doc.term.vector=true > doc.add.log.step=500 > > docs.dir=enwiki > > doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker > > # task at this depth or less would print when they start > task.max.depth.log=1 > doc.maker.forever=false > > # > ------------------------------------------------------------------------------------- > > ResetSystemErase > CreateIndex > {[AddDoc(4000)]: 4} : * > CloseIndex > > RepSumByPref AddDoc > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]