Great Peter. I've posted a new set of attributes based on your submission and Otis' feedback. Let me think about the best way to consolidate these numbers and stick them somewhere accessible for all.
----- Original Message ----- From: "Peter Carlson" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, May 03, 2002 9:50 PM Subject: Performance benchmarks > Some performance numbers > > Java Version: 1.3_01 > OS Version: Windows 2000 > CPU (Type, Speed and Quantity): Pentium 4, 1.5 GHz, 1 CPU > RAM: 512 MB > Drive configuration (IDE, SCSI, RAID-1, RAID-5): IDE (single) > Number of source documents: 103009 > Total filesize of source documents: 430MB > Average filesize of source documents (in KB/MB): 4.3KB > Source documents storage location (filesystem, DB, http,etc): Filesystem > File type of source documents: xml > Parser(s) used, if any: Standard Analyzer > Number of Fields per document: 8 > Time taken (in ms/s as an average of at least 3 indexing runs): 8387 sec > (139 min) > Time taken / 1000 docs indexed: 81 sec / 1000 docs > Notes (any special tuning/strategies): > I convert each document to a DOM, and use xpath to get the fields. > I perform validation on the data and make sure that it meets certain > criteria like total size > 150 characters, and verify there are no > duplicates using a Hashmap. Without these checks, the indexing goes faster > (about 60 seconds/1000 docs). > > > I hope this is helpful. > --Peter > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
