Re: Deferring merging of index segments

Vitaly Funstein Fri, 01 Jun 2012 17:09:56 -0700

Yes, I am only calling IndexWriter.addDocument()

Interestingly, relative performance of either approach seems to greatly
depend on the number of documents per index. In both types of runs, I used
10 writer threads, each writing documents with the same set of fields (but
random values), into its own index as fast as possible, on a 16 core box,
using a rotational disk for index storage (results from my original post
were obtained from a Fusion IO drive, and an even higher # of cores per
machine). For smaller index sizes, the choice of whether to merge segments
in parallel makes much less of a difference, if at all.


So the matrix looks like this:

# docs/index     concurrent merges?      total time, sec    total disk size
===========================================================================
200K             Y                       56.8               1.5 G
200K             N                       59.6               2.6 G
1M               Y                       304                7.4 G
1M               N                       493                14  G

As you can see, the total size on disk is always much larger when merging
at the end; here are directory listings, for each case:

Concurrent merging:

total 150M
-rw-r--r-- 1 bench perf    0 2012-06-01 16:33 write.lock
-rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _a.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _a.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _a.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _a.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _a.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _l.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _l.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _l.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _l.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _l.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _w.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _w.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _w.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _w.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _w.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _17.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _17.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _17.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _17.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _17.frq
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1j.cfs
-rw-r--r-- 1 bench perf   87 2012-06-01 16:33 _1i.fnm
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1k.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1m.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1l.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1n.cfs
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:33 _1i.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:33 _1i.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:33 _1i.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:33 _1i.frq
-rw-r--r-- 1 bench perf 148K 2012-06-01 16:33 _1p.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:33 _1o.cfs
-rw-r--r-- 1 bench perf  28M 2012-06-01 16:33 _0.cfx
-rw-r--r-- 1 bench perf 2.8K 2012-06-01 16:33 segments_2
-rw-r--r-- 1 bench perf   20 2012-06-01 16:33 segments.gen

Deferred merging:

total 261M
-rw-r--r-- 1 bench perf    0 2012-06-01 16:41 write.lock
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _0.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _3.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _2.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _4.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _6.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _5.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _7.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _9.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _8.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _a.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _c.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _b.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _d.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _f.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _e.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _g.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _i.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _h.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _j.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _l.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _k.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _m.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _n.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _p.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _o.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _q.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _s.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _r.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _t.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _v.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _u.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _w.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _x.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _z.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _y.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _11.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _10.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _13.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _12.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _16.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _15.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _14.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _18.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _17.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1b.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1a.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _19.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1d.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1c.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1g.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1f.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1e.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1j.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1i.cfs
-rw-r--r-- 1 bench perf 2.3M 2012-06-01 16:41 _1h.cfs
-rw-r--r-- 1 bench perf  28M 2012-06-01 16:41 _0.cfx
-rw-r--r-- 1 bench perf 137K 2012-06-01 16:42 _1k.cfs
-rw-r--r-- 1 bench perf  12K 2012-06-01 16:42 segments_2
-rw-r--r-- 1 bench perf   20 2012-06-01 16:42 segments.gen
-rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1l.fnm
-rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1n.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1l.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1l.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1l.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1l.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1o.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1n.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1n.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1n.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1n.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1p.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1o.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1o.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1o.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1o.frq
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1p.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1p.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1p.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1p.frq
-rw-r--r-- 1 bench perf   87 2012-06-01 16:42 _1m.fnm
-rw-r--r-- 1 bench perf  17M 2012-06-01 16:42 _1m.tis
-rw-r--r-- 1 bench perf 186K 2012-06-01 16:42 _1m.tii
-rw-r--r-- 1 bench perf 105K 2012-06-01 16:42 _1m.prx
-rw-r--r-- 1 bench perf 4.8M 2012-06-01 16:42 _1m.frq


On Fri, Jun 1, 2012 at 2:25 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> 64% greater index size when you merge at the end is odd.
>
> Can you post the ls -l output of the final index in both cases?
>
> Are you only adding (not deleting) docs?
>
> This is perfectly valid to do... but I'm surprised you see the two
> approaches taking about the same time.  I would expect letting Lucene
> merge as it goes would be net/net faster since merging can soak up
> unused IO bandwidth concurrent to indexing....
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, May 29, 2012 at 9:42 PM, Vitaly Funstein <vfunst...@gmail.com>
wrote:
>> Hello,
>>
>> I am trying to optimize the process of "warming up" an index prior to
>> using the search subsystem, i.e. it is guaranteed that no other writes
>> or searches can take place in parallel with with the warmup. To that
>> end, I have been toying with the idea of turning off segment merging
>> altogether until after all the data has been written and committed. I
>> am currently using Lucene 3.0.3 and migration to a later version is
>> not an option in the short term. So, the way I'm going about turning
>> merging off is as follows:
>>
>> 1. Before warmup, call:
>>
>> IndexWriter.setMaxMergeDocs(0);
>> IndexWriter.getLogMergePolicy().setMaxMergeMB(0);
>>
>> 2. After the warmup task completes, revert the above parameters to
>> their defaults, then call:
>>
>> IndexWriter.maybeMerge();
>> IndexWriter.waitForMerges();
>>
>>
>> Now, I compared my results when deferring segment merges using the
>> above method, with a test run letting Lucene do the merging on the
>> fly. Curiously, the resulting size of indexes on disk is about 64%
>> greater in the former case, although the total time to complete the
>> warmup is almost the same.
>>
>> So I have a few of questions:
>> - is the approach for deferring segment merging flawed in some way?
>> - what could possibly account for the huge difference in file sizes?
>> - what else could I possibly try to further speed up index writing
>> during system's "off hours"?
>>
>> Thanks,
>> -V
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

Re: Deferring merging of index segments

Reply via email to