Thanks for posting your benchmark results Hamish. A while ago, I
started collecting similar posts from a couple of folks who have been
generous enough to share their results.

Let me see if I can find those numbers again, and this time, maybe
they'll make it onto the website or FAQ, with each author's explicit
blessings of course...

Regards,
Kelvin

--------
The book giving manifesto     - http://how.to/sharethisbook


On Fri, 29 Nov 2002 14:41:30 +1300, Hamish Carpenter said:
>Hi Everyone,
>
>I've been lurking on this list for a couple of weeks now.  I thought
>I would contribute my experiences (with timings) of using lucene.
>
>The main issue we have is why does performance decrease
>significantly when searching with multiple threads?
>
>I hope this helps people starting out with lucene to compare with
>their performance.
>
>Hamish
>
>BTW: Optimizing took 3.5 minutes at 500,000 documents and 4.7
>minutes at 1,000,000 documents.  Sorry I don't have memory usage
>figures.
>
><benchmark> Hardware environment --------------------
>Dedicated machine for indexing (yes/no): yes CPU (Type, Speed and
>Quantity): Intel x86 P4 1.5Ghz RAM: 512 DDR Drive configuration
>(IDE, SCSI, RAID-1, RAID-5): IDE 7200rpm Raid-1
>
>Software environment --------------------
>Java Version: 1.3.1 IBM JITC Enabled OS Version: Debian Linux
>2.4.18-686 Location of index directory (local/network): local
>
>Lucene indexing variables -------------------------
>Number of source documents: Random generator. Set to make 1M
>documents in 2x500,000 batches.
>Total filesize of source documents: > 1Gb if stored.
>Average filesize of source documents (in KB/MB): 1kb Source
>documents storage location (filesystem, DB, http,etc): fs File type
>of source documents: generated.
>Parser(s) used, if any: default Analyzer(s) used: default Number of
>fields per document: 11 Type of fields: 1 date, 1 id, 9 text Index
>persistence (FSDirectory, SqlDirectory, etc): FSDirectory
>
>Time taken (in ms/s as an average of at least 3 indexing runs): Time
>taken / 1000 docs indexed: 49seconds Memory consumption: unsure
>
>Notes (any special tuning/strategies):
>--------------------------------------
>A windows client ran a random document generator which created
>documents based on some arrays of values and an excerpt (approx 1kb)
>from a text file of the bible (King James version).
>These were submitted via a socket connection (open throughout
>indexing process).
>The index writer was not closed between index calls.
>This created a 400Mb index in 23 files (after optimization).
>
>Query details: --------------
>Set up a threaded class to start x number of simultaneous threads to
>search the above created index.
>
>Query:  +Domain:sos +(+((Name:goo*^2.0 Name:plan*^2.0) (Teaser:goo*
>Tea ser:plan*) (Details:goo* Details:plan*)) -Cancel:y)
>+DisplayStartDate:[mkwsw2jk0 -mq3dj1uq0] +EndDate:[mq3dj1uq0-
>ntlxuggw0]
>
>This query counted 34000 documents and I limited the returned
>documents to 5.
>
>This is using Peter Halacsy's IndexSearcherCache slightly modified
>to be a singleton returned cached searchers for a given directory.
>This solved an initial problem with too many files open and running
>out of linux handles for them.
>
>Threads|Avg Time per query (ms) 1       1009ms 2       2043ms 3
>3087ms 4       4045ms ..        .
>..        .
>10      10091ms
>
>I removed the two date range terms from the query and it made a HUGE
>difference in performance. With 4 threads the avg time dropped to
>900ms!
>
>Other query optimizations made little difference.
></benchmark>
>
>
>
>--
>To unsubscribe, e-mail:   <mailto:lucene-user-
>[EMAIL PROTECTED]> For additional commands, e-mail:
><mailto:lucene-user-
>[EMAIL PROTECTED]>




--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to