[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490192
 ] 

Steven Parkes commented on LUCENE-847:
--------------------------------------

Here are some numbers comparing the load performance for the factored vs. 
non-factored merge policies.

The setup uses enwiki, loads 200K documents, and uses 4 different combinations 
of maxBufferedDocs and mergeFactor (just the default from the standard 
benchmark, not because I necessarily thought it was a good idea.)

The factored merge policy seems to be on the order of 1% slower loading than 
the non-factored version ... and I'm not sure why, so I'm going to check into 
this. The factored version does more examination of segment list than the 
non-factored version, so there's compute overhead, but I would expect that to 
be swamped by I/O Maybe that's not a good assumption? Or it might be doing 
different merges for reasons I haven't considered, which I'm going to check.

Relating this to some of the merge discussions, I'm going to look at monitoring 
both the number of merges taking place and the size of those merges. I think 
that's helpful in understand different candidate merge policies, in addition to 
absolute difference in runtime.

I also think histogramming  the per-doc cost would also be interesting, since 
mitigating the long delay at cascading merges is at least one goal of a 
concurrent merge policy.

And all this doesn't even consider testing the recent stuff on merging multiple 
indexes. That's an area where the factored merge policy differs (because of the 
simpler interface.)

I'm curious if anyone is surprised by these numbers, the 60 docs/sec, in 
particular. This machine is a dual dual-core xeon, writing to a single local 
disk.  My dual opty achieved ~85-100 docs/sec on a three disk SATA3 RAID5 array.

Non-factored (current) merge policy

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 
out of 33)
     [java] Operation       round mrg buf   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_200000     0  10  10        1       200000         41.6    
4,804.11    11,758,928     12,591,104
     [java] MAddDocs_200000 -   1 100  10 -  -   1 -  -  200000 -  -  - 50.0 -  
4,000.25 -  34,831,992 -   52,563,968
     [java] MAddDocs_200000     2  10 100        1       200000         49.9    
4,004.95    42,158,232     60,444,672
     [java] MAddDocs_200000 -   3 100 100 -  -   1 -  -  200000 -  -  - 57.9 -  
3,455.97 -  45,646,680 -   61,083,648
     [java] MAddDocs_200000     4  10  10        1       200000         44.9    
4,458.66    36,928,616     61,083,648
     [java] MAddDocs_200000 -   5 100  10 -  -   1 -  -  200000 -  -  - 50.4 -  
3,965.98 -  47,855,064 -   61,083,648
     [java] MAddDocs_200000     6  10 100        1       200000         49.7    
4,023.51    52,506,448     64,217,088
     [java] MAddDocs_200000 -   7 100 100 -  -   1 -  -  200000 -  -  - 57.9 -  
3,451.39 -  64,466,128 -   73,220,096

Factored (new) merge policy

     [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 
out of 33)
     [java] Operation       round mrg buf   runCnt   recsPerRun        rec/s  
elapsedSec    avgUsedMem    avgTotalMem
     [java] MAddDocs_200000     0  10  10        1       200000         41.4    
4,828.25    10,477,976     12,386,304
     [java] MAddDocs_200000 -   1 100  10 -  -   1 -  -  200000 -  -  - 50.4 -  
3,968.27 -  38,333,544 -   46,170,112
     [java] MAddDocs_200000     2  10 100        1       200000         50.3    
3,973.52    33,539,824     63,860,736
     [java] MAddDocs_200000 -   3 100 100 -  -   1 -  -  200000 -  -  - 58.6 -  
3,413.87 -  44,580,528 -   87,781,376
     [java] MAddDocs_200000     4  10  10        1       200000         45.3    
4,411.50    57,850,104     87,781,376
     [java] MAddDocs_200000 -   5 100  10 -  -   1 -  -  200000 -  -  - 51.0 -  
3,921.48 -  62,793,432 -   87,781,376
     [java] MAddDocs_200000     6  10 100        1       200000         50.4    
3,969.87    49,625,496     93,966,336
     [java] MAddDocs_200000 -   7 100 100 -  -   1 -  -  200000 -  -  - 58.7 -  
3,409.51 -  68,100,288 -  129,572,864


> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Steven Parkes
>         Assigned To: Steven Parkes
>         Attachments: concurrentMerge.patch, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to