Please see attached for diff to benchmarks.xml for Daniel's numbers.
Thanks Dan!

Regards,
Kelvin

--------
The book giving manifesto     - http://how.to/sharethisbook



cvs -z9 diff benchmarks.xml (in directory C:\checkout\jakarta-lucene\xdocs\)
Index: benchmarks.xml
===================================================================
RCS file: /home/cvspublic/jakarta-lucene/xdocs/benchmarks.xml,v
retrieving revision 1.1
diff -r1.1 benchmarks.xml
278a279,344
>         <subsection name="Daniel Armbrust's benchmarks">
>           <p>
>           My disclaimer is that this is a very poor "Benchmark".  It was not done 
>for raw speed, 
>           nor was the total index built in one shot.  The index was created on 
>several different 
>           machines (all with these specs, or very similar), with each machine 
>indexing batches of 500,000 to 
>           1 million documents per batch.  Each of these small indexes was then moved 
>to a 
>           much larger drive, where they were all merged together into a big index.  
>           This process was done manually, over the course of several months, as the 
>sources became available.
>           </p>
>           <ul>
>           <p>
>           <b>Hardware Environment</b><br/>
>           <li><i>Dedicated machine for indexing</i>: no - The machine had moderate 
>to low load.  However, the indexing process was built single 
> threaded, so it only took advantage of 1 of the processors.  It usually got 100% of 
>this processor.</li>
>           <li><i>CPU</i>: Sun Ultra 80 4 x 64 bit processors</li>
>           <li><i>RAM</i>: 4 GB Memory</li>
>           <li><i>Drive configuration</i>: Ultra-SCSI Wide 10000 RPM 36GB Drive</li>
>           </p>
>           <p>
>           <b>Software environment</b><br/>
>           <li><i>Java Version</i>: 1.3.1</li>
>           <li><i>Java VM</i>: </li>
>           <li><i>OS Version</i>: Sun 5.8 (64 bit)</li>
>           <li><i>Location of index</i>: local</li>
>           </p>
>           <p>
>           <b>Lucene indexing variables</b><br/>
>           <li><i>Number of source documents</i>: 13,820,517</li>
>           <li><i>Total filesize of source documents</i>: 87.3 GB</li>
>           <li><i>Average filesize of source documents</i>: 6.3 KB</li>
>           <li><i>Source documents storage location</i>: Filesystem</li>
>           <li><i>File type of source documents</i>: XML</li>
>           <li><i>Parser(s) used, if any</i>: </li>
>           <li><i>Analyzer(s) used</i>: A home grown analyzer that simply removes 
>stopwords.</li>
>           <li><i>Number of fields per document</i>: 1 - 31</li>
>           <li><i>Type of fields</i>: All text, though 2 of them are dates (20001205) 
>that we filter on</li>
>           <li><i>Index persistence</i>: FSDirectory</li>
>           <li><i>Index size</i>: 12.5 GB</li>
>           </p>
>           <p>
>           <b>Figures</b><br/>
>           <li><i>Time taken (in ms/s as an average of at least 3 
> indexing runs)</i>: For 617271 documents, 209698 seconds (or ~2.5 days)</li>
>           <li><i>Time taken / 1000 docs indexed</i>: 340 Seconds</li>
>           <li><i>Memory consumption</i>: (java executed with) java -Xmx1000m 
>-Xss8192k so 
>           1 GB of memory was allotted to the indexer</li>
>           </p>
>           <p>
>           <b>Notes</b><br/>
>           <li><i>Notes</i>: 
>           <p>
>           The source documents were XML.  The "indexer" opened each document one at 
>a time, ran an 
>           XSL transformation on them, and then proceeded to index the stream.  The 
>indexer optimized 
>           the index every 50,000 documents (on this run) though previously, we 
>optimized every 
>           300,000 documents.  The performance didn't change much either way.  We did 
>no other 
>           tuning (RAM Directories, separate process to pretransform the source 
>material, etc) 
>           to make it index faster.  When all of these individual indexes were built, 
>they were 
>           merged together into the main index.  That process usually took ~ a day.
>           </p></li>
>           </p>
>           </ul>
>           <p>
>           Daniel can be contacted at Armbrust.Daniel at mayo.edu.
>           </p>
>         </subsection> 
> 
--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to