Yes, I am only calling IndexWriter.addDocument() Interestingly, relative performance of either approach seems to greatly depend on the number of documents per index. In both types of runs, I used 10 writer threads, each writing documents with the same set of fields (but random values), into its own index as fast as possible, on a 16 core box, using a rotational disk for index storage (results from my original post were obtained from a Fusion IO drive, and an even higher # of cores per machine). For smaller index sizes, the choice of whether to merge segments in parallel makes much less of a difference, if at all.
So the matrix looks like this: # docs/index concurrent merges? total time, sec total disk size =========================================================================== 200K Y 56.8 1.5 G 200K N 59.6 2.6 G 1M Y 304 7.4 G 1M N 493 14 G As you can see, the total size on disk is always much larger when merging at the end; here are directory listings, for each case: Concurrent merging: total 150M -rw-r--r-- 1 bench perf 0 2012-06-01 16:33 write.lock -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _a.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _a.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _a.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _a.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _a.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _l.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _l.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _l.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _l.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _l.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _w.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _w.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _w.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _w.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _w.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _17.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _17.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _17.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _17.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _17.frq -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1j.cfs -rw-r--r-- 1 bench perf 87 2012-06-01 16:33 _1i.fnm -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1k.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1m.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1l.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1n.cfs -rw-r--r-- 1 bench perf 17M 2012-06-01 16:33 _1i.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _1i.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _1i.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _1i.frq -rw-r--r-- 1 bench perf 148K 2012-06-01 16:33 _1p.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1o.cfs -rw-r--r-- 1 bench perf 28M 2012-06-01 16:33 _0.cfx -rw-r--r-- 1 bench perf 2.8K 2012-06-01 16:33 segments_2 -rw-r--r-- 1 bench perf 20 2012-06-01 16:33 segments.gen Deferred merging: total 261M -rw-r--r-- 1 bench perf 0 2012-06-01 16:41 write.lock -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _0.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _3.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _2.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _4.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _6.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _5.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _7.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _9.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _8.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _a.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _c.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _b.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _d.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _f.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _e.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _g.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _i.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _h.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _j.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _l.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _k.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _m.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _n.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _p.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _o.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _q.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _s.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _r.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _t.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _v.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _u.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _w.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _x.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _z.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _y.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _11.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _10.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _13.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _12.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _16.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _15.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _14.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _18.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _17.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1b.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1a.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _19.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1d.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1c.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1g.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1f.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1e.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1j.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1i.cfs -rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1h.cfs -rw-r--r-- 1 bench perf 28M 2012-06-01 16:41 _0.cfx -rw-r--r-- 1 bench perf 137K 2012-06-01 16:42 _1k.cfs -rw-r--r-- 1 bench perf 12K 2012-06-01 16:42 segments_2 -rw-r--r-- 1 bench perf 20 2012-06-01 16:42 segments.gen -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1l.fnm -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1n.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1l.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1l.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1l.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1l.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1o.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1n.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1n.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1n.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1n.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1p.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1o.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1o.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1o.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1o.frq -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1p.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1p.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1p.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1p.frq -rw-r--r-- 1 bench perf 87 2012-06-01 16:42 _1m.fnm -rw-r--r-- 1 bench perf 17M 2012-06-01 16:42 _1m.tis -rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1m.tii -rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1m.prx -rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1m.frq On Fri, Jun 1, 2012 at 2:25 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > 64% greater index size when you merge at the end is odd. > > Can you post the ls -l output of the final index in both cases? > > Are you only adding (not deleting) docs? > > This is perfectly valid to do... but I'm surprised you see the two > approaches taking about the same time. I would expect letting Lucene > merge as it goes would be net/net faster since merging can soak up > unused IO bandwidth concurrent to indexing.... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, May 29, 2012 at 9:42 PM, Vitaly Funstein <vfunst...@gmail.com> wrote: >> Hello, >> >> I am trying to optimize the process of "warming up" an index prior to >> using the search subsystem, i.e. it is guaranteed that no other writes >> or searches can take place in parallel with with the warmup. To that >> end, I have been toying with the idea of turning off segment merging >> altogether until after all the data has been written and committed. I >> am currently using Lucene 3.0.3 and migration to a later version is >> not an option in the short term. So, the way I'm going about turning >> merging off is as follows: >> >> 1. Before warmup, call: >> >> IndexWriter.setMaxMergeDocs(0); >> IndexWriter.getLogMergePolicy().setMaxMergeMB(0); >> >> 2. After the warmup task completes, revert the above parameters to >> their defaults, then call: >> >> IndexWriter.maybeMerge(); >> IndexWriter.waitForMerges(); >> >> >> Now, I compared my results when deferring segment merges using the >> above method, with a test run letting Lucene do the merging on the >> fly. Curiously, the resulting size of indexes on disk is about 64% >> greater in the former case, although the total time to complete the >> warmup is almost the same. >> >> So I have a few of questions: >> - is the approach for deferring segment merging flawed in some way? >> - what could possibly account for the huge difference in file sizes? >> - what else could I possibly try to further speed up index writing >> during system's "off hours"? >> >> Thanks, >> -V >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >