Hello,

I have discovered a serious bug in the LuceneIndexer benchmarking app. All tests have been rerun, and the new numbers reflect a 13-15% improvement for Lucene. I apologize for having reported bad data.

Here are some of the new results, both with and without the bug so that you can see how the numbers were affected. They were prepared using subversion repository 779.

RESULTS A: 'body' neither stored nor vectorized
======================================================================== === configuration truncated mean secs (6 reps) max memory (1 rep) ------------------------------------------------------------------------ ---
Lucene / JVM 1.4                  43.68                         79 MB
Lucene / JVM 1.5                  44.95                         93 MB
Lucene / JVM 1.4 with bug         49.63                         79 MB
Lucene / JVM 1.5 with bug         50.93                         92 MB

RESULTS B: 'body' stored and vectorized
======================================================================== === configuration truncated mean secs (6 reps) max memory (1 rep) ------------------------------------------------------------------------ ---
Lucene / JVM 1.4                  71.96                        118 MB
Lucene / JVM 1.5                  73.81                        214 MB
Lucene / JVM 1.4 with bug         84.73                        182 MB
Lucene / JVM 1.5 with bug         88.96                        199 MB

The bug was in buildFileList() and resulted in a bogus list of filepaths. KinoSearch and Plucene were indexing 19043 documents once each. Lucene was indexing 22 documents over and over, about 900 times each.

// Return a lexically sorted list of all article files from all subdirs.
  static String[] buildFileList () throws Exception {
    File[] articleDirs = corpusDir.listFiles();
    Vector filePaths = new Vector();
    for (int i = 0; i < articleDirs.length; i++) {
      File[] articles = articleDirs[i].listFiles();
      for (int j = 0; j < articles.length; j++) {
String path = articles[i].getPath(); // <-- BUG: should be j, not i
        if (path.indexOf("article") == -1)
          continue;
        filePaths.add(path);
      }
    }
    Collections.sort(filePaths);
    return (String[])filePaths.toArray(new String[filePaths.size()]);
  }

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to