Thanks for posting your benchmark results Hamish. A while ago, I started collecting similar posts from a couple of folks who have been generous enough to share their results.
Let me see if I can find those numbers again, and this time, maybe they'll make it onto the website or FAQ, with each author's explicit blessings of course... Regards, Kelvin -------- The book giving manifesto - http://how.to/sharethisbook On Fri, 29 Nov 2002 14:41:30 +1300, Hamish Carpenter said: >Hi Everyone, > >I've been lurking on this list for a couple of weeks now. I thought >I would contribute my experiences (with timings) of using lucene. > >The main issue we have is why does performance decrease >significantly when searching with multiple threads? > >I hope this helps people starting out with lucene to compare with >their performance. > >Hamish > >BTW: Optimizing took 3.5 minutes at 500,000 documents and 4.7 >minutes at 1,000,000 documents. Sorry I don't have memory usage >figures. > ><benchmark> Hardware environment -------------------- >Dedicated machine for indexing (yes/no): yes CPU (Type, Speed and >Quantity): Intel x86 P4 1.5Ghz RAM: 512 DDR Drive configuration >(IDE, SCSI, RAID-1, RAID-5): IDE 7200rpm Raid-1 > >Software environment -------------------- >Java Version: 1.3.1 IBM JITC Enabled OS Version: Debian Linux >2.4.18-686 Location of index directory (local/network): local > >Lucene indexing variables ------------------------- >Number of source documents: Random generator. Set to make 1M >documents in 2x500,000 batches. >Total filesize of source documents: > 1Gb if stored. >Average filesize of source documents (in KB/MB): 1kb Source >documents storage location (filesystem, DB, http,etc): fs File type >of source documents: generated. >Parser(s) used, if any: default Analyzer(s) used: default Number of >fields per document: 11 Type of fields: 1 date, 1 id, 9 text Index >persistence (FSDirectory, SqlDirectory, etc): FSDirectory > >Time taken (in ms/s as an average of at least 3 indexing runs): Time >taken / 1000 docs indexed: 49seconds Memory consumption: unsure > >Notes (any special tuning/strategies): >-------------------------------------- >A windows client ran a random document generator which created >documents based on some arrays of values and an excerpt (approx 1kb) >from a text file of the bible (King James version). >These were submitted via a socket connection (open throughout >indexing process). >The index writer was not closed between index calls. >This created a 400Mb index in 23 files (after optimization). > >Query details: -------------- >Set up a threaded class to start x number of simultaneous threads to >search the above created index. > >Query: +Domain:sos +(+((Name:goo*^2.0 Name:plan*^2.0) (Teaser:goo* >Tea ser:plan*) (Details:goo* Details:plan*)) -Cancel:y) >+DisplayStartDate:[mkwsw2jk0 -mq3dj1uq0] +EndDate:[mq3dj1uq0- >ntlxuggw0] > >This query counted 34000 documents and I limited the returned >documents to 5. > >This is using Peter Halacsy's IndexSearcherCache slightly modified >to be a singleton returned cached searchers for a given directory. >This solved an initial problem with too many files open and running >out of linux handles for them. > >Threads|Avg Time per query (ms) 1 1009ms 2 2043ms 3 >3087ms 4 4045ms .. . >.. . >10 10091ms > >I removed the two date range terms from the query and it made a HUGE >difference in performance. With 4 threads the avg time dropped to >900ms! > >Other query optimizations made little difference. ></benchmark> > > > >-- >To unsubscribe, e-mail: <mailto:lucene-user- >[EMAIL PROTECTED]> For additional commands, e-mail: ><mailto:lucene-user- >[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
