This is an interesting performance problem and I think there is probably not
a single answer here, so I'll just layout the steps I would take to tackle this:
1. What is the variance of the query latency? You said the average is 5 minutes,
but is it due to some really bad queries or most queries have the same perf?
2. We kind of assume that index size and number of docs is the issue here.
Can you validate that assumption by trying to index with 10M, 50M, … docs
and see how worse the performance is getting as a function of size?
3. What is the average doc hits for the bad queries? If you queries matches
a lot of hits, scoring will be very expensive. While you only ask for 1000 top
scored docs, Lucene still needs to score all the hits to get that 1000 docs.
If this is the case, there could be some work around, but Iet's make sure
that it's indeed the situation we are dealing with here.
Hope this helps,
Tri
On Jun 01, 2014, at 11:50 PM, Jamie <ja...@mailarchiva.com> wrote:
Greetings
Despite following all the recommended optimizations (as described at
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of
our installations, search performance has reached the point where is it
unacceptably slow. For instance, in one environment, the total index
size is 200GB, with 150 million documents indexed. With NRT enabled,
search speed is roughly 5 minutes on average. The server resources are:
2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux.
The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to
4.8.x. Is this likely to make any noticeable difference in performance?
Clearly, longer term, we need to move to a distributed search model. We
thought to take advantage of the distributed search features offered in
Solr, however, our solution is very tightly integrated into Lucene
directly (since Solr didn't exist when we started out). Moving to Solr
now seems like a daunting prospect. We've also following the Katta
project with interest, but it doesn't appear support distributed
indexing, and development on it seems to have stalled. It would be nice
if there were a distributed search project on the Lucene level that we
could use.
I realize this is a rather vague question, but are there any further
suggestions on ways to improve search performance? We need cheap and
dirty ideas, as well as longer term advice on a possible path forward.
Much appreciate
Jamie
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org