Note: this is a reply to a posting to java-dev --Peter
Eric,
Now that it is live, is performance pretty good?
Performance is outstanding. Each server can easily handle well over 100 qps on an index of over 800K documents. There are several servers (4 dual core (8 CPU) Opteron) supporting the query load and we have backup servers for disaster recovery. For a few hours one day, all job search query traffic for the entire site was being handled by a single server - with no noticable latency!
Are you using dotLucene or a webservice tier and java?
We are using Java Lucene on dedicated servers.
How did you implement your bounding box for the searching? It sounds like
you do this outside of lucene and return a custom hitcollector. The 'bounding box' is merely the conjunction of 2 numeric range searches. It's really not that hard to do - I think there has been discussion of this elsewhere in this group. We use (not 'return') a custom HitCollector to exclude hits that aren't in the bounding box. I tried to explain this in a reply earlier today, but if I failed let me know.
Why not use a rangequery or functionquery for the basic bounding before
sorting Basically, 'RangeQuery' doesn't offer sufficient performance. We have implemented our own 'numeric value' search 'next to Lucene' (I think I like this better than 'outside of Lucene' ;-)). FunctionQuery could be used if you wanted the jobs sorted by a combination of keywords and distance. Our users (apparently) expect the jobs to be sorted strictly by distance on a radius search. ================================================================ Peter Hello Peter, Now that the monster lucene search is live, is performance pretty good? Are you still running it on a single 8 core server? Can you give me a rough idea on the number of queries you can handle/second and the number of docs in the index? Are you using dotLucene or a webservice tier and java? How did you implement your bounding box for the searching? It sounds like you do this outside of lucene and return a custom hitcollector. Why not use a rangequery or functionquery for the basic bounding before sorting? Thanks, Eric