Re: Benchmarking results

Marvin Humphrey Sun, 09 Apr 2006 22:11:49 -0700

Hello,

I have discovered a serious bug in the LuceneIndexer benchmarkingapp. All tests have been rerun, and the new numbers reflect a 13-15%improvement for Lucene. I apologize for having reported bad data.

Here are some of the new results, both with and without the bug sothat you can see how the numbers were affected. They were preparedusing subversion repository 779.


RESULTS A: 'body' neither stored nor vectorized

===========================================================================configuration truncated mean secs (6 reps) max memory(1 rep)---------------------------------------------------------------------------

Lucene / JVM 1.4                  43.68                         79 MB
Lucene / JVM 1.5                  44.95                         93 MB
Lucene / JVM 1.4 with bug         49.63                         79 MB
Lucene / JVM 1.5 with bug         50.93                         92 MB

RESULTS B: 'body' stored and vectorized

Lucene / JVM 1.4                  71.96                        118 MB
Lucene / JVM 1.5                  73.81                        214 MB
Lucene / JVM 1.4 with bug         84.73                        182 MB
Lucene / JVM 1.5 with bug         88.96                        199 MB

The bug was in buildFileList() and resulted in a bogus list offilepaths. KinoSearch and Plucene were indexing 19043 documents onceeach. Lucene was indexing 22 documents over and over, about 900times each.

// Return a lexically sorted list of all article files from allsubdirs.

  static String[] buildFileList () throws Exception {
    File[] articleDirs = corpusDir.listFiles();
    Vector filePaths = new Vector();
    for (int i = 0; i < articleDirs.length; i++) {
      File[] articles = articleDirs[i].listFiles();
      for (int j = 0; j < articles.length; j++) {

String path = articles[i].getPath(); // <-- BUG: should bej, not i

        if (path.indexOf("article") == -1)
          continue;
        filePaths.add(path);
      }
    }
    Collections.sort(filePaths);
    return (String[])filePaths.toArray(new String[filePaths.size()]);
  }

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Benchmarking results

Reply via email to