Michael McCandless wrote:
On the impact of search performance for large vs small mergeFactors, I
think the jury is still out. People should keep testing that (and
report back!). Certainly, for the fastest reopen time you never want
any merging to be done :)
Here is the original exchange I referenced:
>>On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller <markrmil...@gmail.com>
wrote:
>> 24 segments is bound to be quite a bit slower than an optimized
index for most things
>I'd be curious just how true this really is (in general)... my guess
>is the "long tail of tiny segments" gets into the OS's IO cache (as
>long as the system stays hot) and doesn't actually hurt things much.
>
>Has anyone tested this (performance of unoptimized vs optimized
>indexes, in general) recently? To be a fair comparison, there should
>be no deletions in the index.
>
>Mike
After reading that, I played with some sorting code I had and did a
quick cheesy test or two - one segment vs a 10 or 20. In that horrible
test (based on the stress sort code), I don't remember seeing much of a
difference. No sorting. Very, very unscientific, quick and dirty.
This time I loaded up 1.3 million wikipedia articles, gave the test
768MB of RAM, warmed the Searcher with lots of searching before each
measurement, and compared 1 segment vs 5. The optimized index was 15-20%
faster with the queries I was using (approx 100 queries targeted at
wikipedia). Its an odd test system - Ubuntu, Quad core laptop with slow
laptop drives and 4 gig of RAM. Still not very scientific, but better
than before.
Here is the benchmark I was using in various forms:
{ "Rounds"
ResetSystemErase
{ "Populate"
-CreateIndex
{ "MAddDocs" AddDoc > : 15000
-CloseIndex
}
{ "test"
OpenReader
{ "WarmRdrDocs" Warm > : 50
{ "WarmRdr" Search > : 5000
{ "SearchSameRdr" Search > : 50000
CloseReader
OpenIndex
PrintSegmentCount
Optimize
CloseIndex
NewRound
} : 2
}
RepSumByName
RepSumByPrefRound SearchSameRdr
I also did a quick profile for a 15k index, 1seg vs 10 segs. I profiled
each for approx 11 million calls of readVint. The hotspot results are below.
http://myhardshadow.com/images/1seg.png
http://myhardshadow.com/images/10seg.png
Just a quick start at looking into this from over the weekend.
--
- Mark
http://www.lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org