I noticed delays when concurrent threads query an IndexSearcher too.
our index is about 550MB with about 850,000 docs. each doc with 20-30 fields of which only 3 are indexed. Our queries are not very complex -- just 3 required term queries.
this is what my test did:
intialilize an array of terms that are known to appear in the
initialize a IndexSearcher
start a number of threads
that query the indexsearcher and extract
each thread picks random terms that are known to appear in the indexed Keyword fields and builds a boolean query
and then extracts all 20-30 fields from the 1st 10 hits.
waits .5 seconds each thread does this 30 times.
typical queries returned 20 - 100 hits
with just one thread: 30 queries ran over a span about 20 seconds. search time for each query generally took 40ms to 75ms. The longest search time was 445ms but searches that took more than 100ms were rare.
with 5 threads: 150 queries ran over a span of 62 seconds. search time for each query for the most part increased to 120ms to 300ms. big delays were more prevalent and took 3 or 4 seconds.
with 10 or more threads things got bad. and I didn't run enough tests. but most searches took 1 to 2 seconds and some searches did take 20 to 30 seconds.
when I ran the test with 5 concurrent thread each doing one query search times were like 100ms to 200 ms with a max of 700ms.
I have not looked into the code Lucene much and I didn't think queries were queued.
I ran my test with the -DdisableLuceneLocks in the command line. But I wasn't sure it did anything.
I ran the test on Lucene1.3 final on my powerbook G4 and tests ran with alot of other processes going on.
I was interested in this discussion because I could not figure out the delay if queries are run in parallel.
On Jun 2, 2004, at 9:32 PM, Doug Cutting wrote:
Jayant Kumar wrote:We recently tested lucene with an index size of 2 GB which has about 1,500,000 documents, each document having about 25 fields. The frequency of search was about 20 queries per second. This resulted in an average response time of about 20 seconds approx per search.
That sounds slow, unless your queries are very complex. What are your queries like?
What we observed was that lucene queues the queries and does not release them until the results are found. so the queries that have come in later take up about 500 seconds. Please let us know whether there is a technique to optimize lucene in such circumstances.
Multiple queries executed from different threads using a single searcher should not queue, but should run in parallel. A technique to find out where threads are queueing is to get a thread dump and see where all of the threads are stuck. In Solaris and Linux, sending the JVM a SIGQUIT will give a thread dump. On Windows, use Control-Break.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
