Re: search performance

Tri Cao Mon, 02 Jun 2014 12:46:28 -0700

This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:


1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries have the same perf?

2. We kind of assume that index size and number of docs is the issue here.
Can you validate that assumption by trying to index with 10M, 50M, … docs
and see how worse the performance is getting as a function of size?

3. What is the average doc hits for the bad queries? If you queries matches
a lot of hits, scoring will be very expensive. While you only ask for 1000 top
scored docs, Lucene still needs to score all the hits to get that 1000 docs.
If this is the case, there could be some work around, but Iet's make sure
that it's indeed the situation we are dealing with here.

Hope this helps,
Tri

On Jun 01, 2014, at 11:50 PM, Jamie <ja...@mailarchiva.com> wrote:

Greetings

Despite following all the recommended optimizations (as described athttp://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some ofour installations, search performance has reached the point where is itunacceptably slow. For instance, in one environment, the total indexsize is 200GB, with 150 million documents indexed. With NRT enabled,search speed is roughly 5 minutes on average. The server resources are:2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux.

The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to4.8.x. Is this likely to make any noticeable difference in performance?

Clearly, longer term, we need to move to a distributed search model. Wethought to take advantage of the distributed search features offered inSolr, however, our solution is very tightly integrated into Lucenedirectly (since Solr didn't exist when we started out). Moving to Solrnow seems like a daunting prospect. We've also following the Kattaproject with interest, but it doesn't appear support distributedindexing, and development on it seems to have stalled. It would be niceif there were a distributed search project on the Lucene level that wecould use.

I realize this is a rather vague question, but are there any furthersuggestions on ways to improve search performance? We need cheap anddirty ideas, as well as longer term advice on a possible path forward.


Much appreciate

Jamie

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: search performance

Reply via email to