Optimizing search speed & performance for a 10G Index.

Chun Wei Ho Thu, 07 Dec 2006 22:11:22 -0800

Hi,

We run a search engine based on Lucene 1.9.1 / Nutch 0.7.2. Our index
has approximately 2 million documents and the physical size of it is
about 10 GB. We run it as a tomcat web application on a Fedora Core 4
server with duo Xeon 3.2GHz processors and 4GB RAM.


We receive about 46500 web search requests a day (ranging from 50-300
requests per 5 minutes across the day). Each web search request could
spawn about one to three actual Lucene searches. Our average response
time (calculated from the server side - and so excludes network
latency), is about 2 seconds.

Does this timing of 2 seconds appear plausible for Lucene, based on
the machine specifications above.


Our index is slightly more complex (with multiple fields like title,
location, site, content). For example, a search for "Linux and Lucene"
related entries in "Australia" might result in lucene searches for:

((title:linux^1.0 title:lucene^1.0)^4.0)
+((
+(title:linux^5.0 location:linux^1.5 content:linux^1.0)
+(title:lucene^5.0 location:lucene^1.5 content:lucene^1.0))
((+(+content:linux +content:lucene)) +(site:contentsite1
site:contentsite2 site:contentsite3 site:contentsite4
site:contentsite5 site:contentsite6 site:contentsite7)))^0.01))
+location:australia)
+newsdate:[20061107 TO 20061208]
+region:au)
-jobsite:badsite1 -region:badregion1 -jobsite:badsite2
-jobsite:badsite3 -jobsite:badsite4

Does anyone have ideas or could point us to resources that would allow
us to improve this performance? 2 seconds response added with network
latency gives an impression of "slowness" of our site that we are
trying to reduce.

Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Optimizing search speed & performance for a 10G Index.

Reply via email to