[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490192 ]
Steven Parkes commented on LUCENE-847: -------------------------------------- Here are some numbers comparing the load performance for the factored vs. non-factored merge policies. The setup uses enwiki, loads 200K documents, and uses 4 different combinations of maxBufferedDocs and mergeFactor (just the default from the standard benchmark, not because I necessarily thought it was a good idea.) The factored merge policy seems to be on the order of 1% slower loading than the non-factored version ... and I'm not sure why, so I'm going to check into this. The factored version does more examination of segment list than the non-factored version, so there's compute overhead, but I would expect that to be swamped by I/O Maybe that's not a good assumption? Or it might be doing different merges for reasons I haven't considered, which I'm going to check. Relating this to some of the merge discussions, I'm going to look at monitoring both the number of merges taking place and the size of those merges. I think that's helpful in understand different candidate merge policies, in addition to absolute difference in runtime. I also think histogramming the per-doc cost would also be interesting, since mitigating the long delay at cascading merges is at least one goal of a concurrent merge policy. And all this doesn't even consider testing the recent stuff on merging multiple indexes. That's an area where the factored merge policy differs (because of the simpler interface.) I'm curious if anyone is surprised by these numbers, the 60 docs/sec, in particular. This machine is a dual dual-core xeon, writing to a single local disk. My dual opty achieved ~85-100 docs/sec on a three disk SATA3 RAID5 array. Non-factored (current) merge policy [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 33) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_200000 0 10 10 1 200000 41.6 4,804.11 11,758,928 12,591,104 [java] MAddDocs_200000 - 1 100 10 - - 1 - - 200000 - - - 50.0 - 4,000.25 - 34,831,992 - 52,563,968 [java] MAddDocs_200000 2 10 100 1 200000 49.9 4,004.95 42,158,232 60,444,672 [java] MAddDocs_200000 - 3 100 100 - - 1 - - 200000 - - - 57.9 - 3,455.97 - 45,646,680 - 61,083,648 [java] MAddDocs_200000 4 10 10 1 200000 44.9 4,458.66 36,928,616 61,083,648 [java] MAddDocs_200000 - 5 100 10 - - 1 - - 200000 - - - 50.4 - 3,965.98 - 47,855,064 - 61,083,648 [java] MAddDocs_200000 6 10 100 1 200000 49.7 4,023.51 52,506,448 64,217,088 [java] MAddDocs_200000 - 7 100 100 - - 1 - - 200000 - - - 57.9 - 3,451.39 - 64,466,128 - 73,220,096 Factored (new) merge policy [java] ------------> Report sum by Prefix (MAddDocs) and Round (8 about 8 out of 33) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_200000 0 10 10 1 200000 41.4 4,828.25 10,477,976 12,386,304 [java] MAddDocs_200000 - 1 100 10 - - 1 - - 200000 - - - 50.4 - 3,968.27 - 38,333,544 - 46,170,112 [java] MAddDocs_200000 2 10 100 1 200000 50.3 3,973.52 33,539,824 63,860,736 [java] MAddDocs_200000 - 3 100 100 - - 1 - - 200000 - - - 58.6 - 3,413.87 - 44,580,528 - 87,781,376 [java] MAddDocs_200000 4 10 10 1 200000 45.3 4,411.50 57,850,104 87,781,376 [java] MAddDocs_200000 - 5 100 10 - - 1 - - 200000 - - - 51.0 - 3,921.48 - 62,793,432 - 87,781,376 [java] MAddDocs_200000 6 10 100 1 200000 50.4 3,969.87 49,625,496 93,966,336 [java] MAddDocs_200000 - 7 100 100 - - 1 - - 200000 - - - 58.7 - 3,409.51 - 68,100,288 - 129,572,864 > Factor merge policy out of IndexWriter > -------------------------------------- > > Key: LUCENE-847 > URL: https://issues.apache.org/jira/browse/LUCENE-847 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Steven Parkes > Assigned To: Steven Parkes > Attachments: concurrentMerge.patch, LUCENE-847.txt > > > If we factor the merge policy out of IndexWriter, we can make it pluggable, > making it possible for apps to choose a custom merge policy and for easier > experimenting with merge policy variants. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]