Re: Are there any Lucene optimizations applicable to SSD?

Toke Eskildsen Wed, 20 Aug 2008 01:06:45 -0700

On Wed, 2008-08-20 at 00:25 +0800, Cedric Ho wrote:
> Search response time. We used the search log from our production
> system and test it with SSD. The results shows that 75% of queries
> returns within 1 second, 90% returns in 2.5 seconds, the remaining 10%
> ranges from 2.5 seconds to less than 100 seconds.


Are the sub-second response times fairly equal or fluctuating?

Do you have the figures for conventional harddisks?

> Total number of queries is ~40000, so about 10000 queries are kind of
> slow, 1000 queries are very slow. But those 10% very slow queries are
> not from the first 1000 queries. It's more or less evenly distributed.

Is it the same queries that are slow each time?

> It's a single OCZ 64G SSD. We just got it yesterday. Is there a big
> difference between different SSDs?

Depends on who you ask...

There's a big difference for bulk reads and writes, but that's not the
main issue for searches. While there's also a difference in
IO-operations/second, the count is normally so high, compared to discs,
that just using a random SSD (discounting slap-4-CF-cards-together
hacks) should give you a clear improvement.

The OCZ-drives are getting fine reviews and I would definitely count
them as serious SSDs.

Have you checked to see if your controller can keep up with the drive?
Running a standard harddrive-benchmark with IO-operations/second
measurements might be an idea.

> All queries involves a Date Range Filter and a Publication Filter.
> We've used WrappingCachingFilters for the Publication Filter for there
> are only a limited number of combinations for this filter. For the
> Date Range Filter we just let it run every time which seems to be
> doing fine.
> 
> The queries also range from simple term query to phraseQueries to
> nested spanQueries. Number of search terms > 10 is not uncommon.

Our queries normally has fewer terms but gets expanded to 4-5 fields.
We have no filters and very few ranges.

Have you tried monitoring the memory-usage of the JVM? At one time we
had a lot of short-term allocation of fairly large structures - this
triggered a lot of full GC's and killed performance.

This still doesn't account for all the > 1 second response times though.

> There are 3 returned fields, docId, date and publication, all of which
> we retrieve through fieldCaches.

We retrieve only a single field for 20 hits, using no caching, but it's
½-1KB in size. Upping the number of retrieved fields doesn't affect
performance in any significant way for us.

> And we use this method to do the search:
> TopFieldDocs Searcher.search(Query query, Filter filter, int n, Sort sort)
> where for the test run n=100

I'll see if I can steal time enough to make a test with sorting, but no
promises.

> We are targeting to get >90% of queries to return under 1 sec. Of
> course the more the better =)

That's about the same as our original goal, but we've gotten greedier in
the meantime.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Are there any Lucene optimizations applicable to SSD?

Reply via email to