Re: ConcurrentMergeScheduler and MergePolicy question

Mark Miller Mon, 03 Aug 2009 13:00:13 -0700

Michael McCandless wrote:

On the impact of search performance for large vs small mergeFactors, I
think the jury is still out.  People should keep testing that (and
report back!).  Certainly, for the fastest reopen time you never want
any merging to be done :)

Here is the original exchange I referenced:

>>On Fri, Apr 10, 2009 at 3:06 PM, Mark Miller <markrmil...@gmail.com>wrote:>> 24 segments is bound to be quite a bit slower than an optimizedindex for most things


>I'd be curious just how true this really is (in general)... my guess
>is the "long tail of tiny segments" gets into the OS's IO cache (as
>long as the system stays hot) and doesn't actually hurt things much.
>
>Has anyone tested this (performance of unoptimized vs optimized
>indexes, in general) recently?  To be a fair comparison, there should
>be no deletions in the index.
>
>Mike

After reading that, I played with some sorting code I had and did aquick cheesy test or two - one segment vs a 10 or 20. In that horribletest (based on the stress sort code), I don't remember seeing much of adifference. No sorting. Very, very unscientific, quick and dirty.

This time I loaded up 1.3 million wikipedia articles, gave the test768MB of RAM, warmed the Searcher with lots of searching before eachmeasurement, and compared 1 segment vs 5. The optimized index was 15-20%faster with the queries I was using (approx 100 queries targeted atwikipedia). Its an odd test system - Ubuntu, Quad core laptop with slowlaptop drives and 4 gig of RAM. Still not very scientific, but betterthan before.



Here is the benchmark I was using in various forms:

{ "Rounds"

   ResetSystemErase

   { "Populate"
       -CreateIndex
       { "MAddDocs" AddDoc > : 15000
       -CloseIndex
   }
   { "test"

OpenReader{ "WarmRdrDocs" Warm > : 50

       { "WarmRdr" Search > : 5000
       { "SearchSameRdr" Search > : 50000
       CloseReader

OpenIndex

       PrintSegmentCount

OptimizeCloseIndexNewRound

   } : 2
}

RepSumByName
RepSumByPrefRound SearchSameRdr

I also did a quick profile for a 15k index, 1seg vs 10 segs. I profiledeach for approx 11 million calls of readVint. The hotspot results are below.


http://myhardshadow.com/images/1seg.png
http://myhardshadow.com/images/10seg.png


Just a quick start at looking into this from over the weekend.

--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: ConcurrentMergeScheduler and MergePolicy question

Reply via email to