On Wed, 2008-08-20 at 00:25 +0800, Cedric Ho wrote: > Search response time. We used the search log from our production > system and test it with SSD. The results shows that 75% of queries > returns within 1 second, 90% returns in 2.5 seconds, the remaining 10% > ranges from 2.5 seconds to less than 100 seconds.
Are the sub-second response times fairly equal or fluctuating? Do you have the figures for conventional harddisks? > Total number of queries is ~40000, so about 10000 queries are kind of > slow, 1000 queries are very slow. But those 10% very slow queries are > not from the first 1000 queries. It's more or less evenly distributed. Is it the same queries that are slow each time? > It's a single OCZ 64G SSD. We just got it yesterday. Is there a big > difference between different SSDs? Depends on who you ask... There's a big difference for bulk reads and writes, but that's not the main issue for searches. While there's also a difference in IO-operations/second, the count is normally so high, compared to discs, that just using a random SSD (discounting slap-4-CF-cards-together hacks) should give you a clear improvement. The OCZ-drives are getting fine reviews and I would definitely count them as serious SSDs. Have you checked to see if your controller can keep up with the drive? Running a standard harddrive-benchmark with IO-operations/second measurements might be an idea. > All queries involves a Date Range Filter and a Publication Filter. > We've used WrappingCachingFilters for the Publication Filter for there > are only a limited number of combinations for this filter. For the > Date Range Filter we just let it run every time which seems to be > doing fine. > > The queries also range from simple term query to phraseQueries to > nested spanQueries. Number of search terms > 10 is not uncommon. Our queries normally has fewer terms but gets expanded to 4-5 fields. We have no filters and very few ranges. Have you tried monitoring the memory-usage of the JVM? At one time we had a lot of short-term allocation of fairly large structures - this triggered a lot of full GC's and killed performance. This still doesn't account for all the > 1 second response times though. > There are 3 returned fields, docId, date and publication, all of which > we retrieve through fieldCaches. We retrieve only a single field for 20 hits, using no caching, but it's ½-1KB in size. Upping the number of retrieved fields doesn't affect performance in any significant way for us. > And we use this method to do the search: > TopFieldDocs Searcher.search(Query query, Filter filter, int n, Sort sort) > where for the test run n=100 I'll see if I can steal time enough to make a test with sorting, but no promises. > We are targeting to get >90% of queries to return under 1 sec. Of > course the more the better =) That's about the same as our original goal, but we've gotten greedier in the meantime. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]