Hello,
I have discovered a serious bug in the LuceneIndexer benchmarking
app. All tests have been rerun, and the new numbers reflect a 13-15%
improvement for Lucene. I apologize for having reported bad data.
Here are some of the new results, both with and without the bug so
that you can see how the numbers were affected. They were prepared
using subversion repository 779.
RESULTS A: 'body' neither stored nor vectorized
========================================================================
===
configuration truncated mean secs (6 reps) max memory
(1 rep)
------------------------------------------------------------------------
---
Lucene / JVM 1.4 43.68 79 MB
Lucene / JVM 1.5 44.95 93 MB
Lucene / JVM 1.4 with bug 49.63 79 MB
Lucene / JVM 1.5 with bug 50.93 92 MB
RESULTS B: 'body' stored and vectorized
========================================================================
===
configuration truncated mean secs (6 reps) max memory
(1 rep)
------------------------------------------------------------------------
---
Lucene / JVM 1.4 71.96 118 MB
Lucene / JVM 1.5 73.81 214 MB
Lucene / JVM 1.4 with bug 84.73 182 MB
Lucene / JVM 1.5 with bug 88.96 199 MB
The bug was in buildFileList() and resulted in a bogus list of
filepaths. KinoSearch and Plucene were indexing 19043 documents once
each. Lucene was indexing 22 documents over and over, about 900
times each.
// Return a lexically sorted list of all article files from all
subdirs.
static String[] buildFileList () throws Exception {
File[] articleDirs = corpusDir.listFiles();
Vector filePaths = new Vector();
for (int i = 0; i < articleDirs.length; i++) {
File[] articles = articleDirs[i].listFiles();
for (int j = 0; j < articles.length; j++) {
String path = articles[i].getPath(); // <-- BUG: should be
j, not i
if (path.indexOf("article") == -1)
continue;
filePaths.add(path);
}
}
Collections.sort(filePaths);
return (String[])filePaths.toArray(new String[filePaths.size()]);
}
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]