There is also expungeDeletes()... ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
> -----Original Message----- > From: v.se...@lombardodier.com [mailto:v.se...@lombardodier.com] > Sent: Thursday, July 21, 2011 8:39 PM > To: java-user@lucene.apache.org > Subject: Re: optimize with num segments > 1 index keeps growing > > Hi, thanks for this explanation. > so what is the best solution: merge the large segment (how can I do that) or > work with many segments (10?) so that I will avoid have this "large segment" > issue? > thanks, > vince > > > Vincent Sevel > Lombard Odier Darier Hentsch & Cie > 11, rue de la Corraterie - 1204 Genève - Suisse T +41 22 709 3376 - F +41 22 709 > 3782 www.lombardodier.com > > > > > > > > Simon Willnauer <simon.willna...@googlemail.com> > > > 21.07.2011 20:06 > Please respond to > java-user@lucene.apache.org > > > > To > java-user@lucene.apache.org > cc > > Subject > Re: optimize with num segments > 1 index keeps growing > > > > > > > so the problem here is that you have one really big segment _52aho.* and > several smaller ones _7e0wz.*, _7e0xu.*, _7e1x5.* .... > if you optimize to 2 segmetns all the smaller segments are merged into one > but all the large segment remains untouched. This means that all deleted > documents in the large segment are not removed / freed while if you > optimized to one segment they are removed. In the single seg. > index there is no *.del file present meaning no deletes. Unless you merge > the large segment all you deleted documents are only marked as delete but > not yet removed. > > simon > > On Thu, Jul 21, 2011 at 5:50 PM, <v.se...@lombardodier.com> wrote: > > hi, > > closing after the 2 segments optimize does not change it. > > also I am running with lucene 3.1.0. > > cheers, > > vince > > > > > > > > > > > > > > > > > > > > Ian Lea <ian....@gmail.com> > > > > > > 21.07.2011 17:30 > > Please respond to > > java-user@lucene.apache.org > > > > > > > > To > > java-user@lucene.apache.org > > cc > > > > Subject > > Re: optimize with num segments > 1 index keeps growing > > > > > > > > > > > > > > A write.lock file with timestamp of 13:58 is in all the listings. The > > first thing I'd try is to add some IndexWriter.close() calls. > > > > > > -- > > Ian. > > > > > > > > On Thu, Jul 21, 2011 at 4:05 PM, <v.se...@lombardodier.com> wrote: > >> Hi, > >> > >> here is a concrete example. > >> > >> I am starting with an index that has 19017236 docs, which takes 58989 > Mb > >> on disk: > >> > >> 21.07.2011 15:21 20 segments.gen > >> 21.07.2011 15:21 2'974 segments_2acy4 > >> 21.07.2011 13:58 0 write.lock > >> 16.07.2011 02:21 33'445'798'886 _52aho.fdt > >> 16.07.2011 02:21 178'723'932 _52aho.fdx > >> 16.07.2011 01:58 5'002 _52aho.fnm > >> 16.07.2011 03:10 9'857'410'889 _52aho.frq > >> 16.07.2011 03:10 4'538'234'846 _52aho.prx > >> 16.07.2011 03:10 61'581'767 _52aho.tii > >> 16.07.2011 03:10 5'505'039'790 _52aho.tis > >> 21.07.2011 01:01 1'899'536 _52aho_5.del > >> 21.07.2011 01:05 4'222'206'034 _6t61z.fdt > >> 21.07.2011 01:05 21'424'556 _6t61z.fdx > >> 21.07.2011 01:01 5'002 _6t61z.fnm > >> 21.07.2011 01:12 1'170'370'187 _6t61z.frq > >> 21.07.2011 01:12 598'373'388 _6t61z.prx > >> 21.07.2011 01:12 7'574'912 _6t61z.tii > >> 21.07.2011 01:12 678'766'206 _6t61z.tis > >> 21.07.2011 13:46 1'458'592'058 _7d6me.cfs > >> 21.07.2011 13:48 15'702'654 _7dhgz.cfs > >> 21.07.2011 13:52 16'800'942 _7dphm.cfs > >> 21.07.2011 13:55 16'714'431 _7dxht.cfs > >> 21.07.2011 14:24 17'505'435 _7e0wz.cfs > >> 21.07.2011 14:24 5'875'852 _7e0xu.cfs > >> 21.07.2011 14:48 18'340'470 _7e1x5.cfs > >> 21.07.2011 15:19 16'978'564 _7e3ck.cfs > >> 21.07.2011 15:21 1'208'656 _7e3hv.cfs > >> 21.07.2011 15:21 19'361 _7e3hw.cfs > >> 28 File(s) 61'855'156'350 bytes > >> > >> I am doing a delete of some of the older documents. after the delete, > >> I commit then I optimize down to 2 segments. at the end of the > >> optimize > > the > >> index contains 18702510 docs (314727 were deleted) and it takes now > > 58975 > >> Mb on disk: > >> > >> 21.07.2011 15:37 20 segments.gen > >> 21.07.2011 15:37 524 segments_2acy6 > >> 21.07.2011 13:58 0 write.lock > >> 16.07.2011 02:21 33'445'798'886 _52aho.fdt > >> 16.07.2011 02:21 178'723'932 _52aho.fdx > >> 16.07.2011 01:58 5'002 _52aho.fnm > >> 16.07.2011 03:10 9'857'410'889 _52aho.frq > >> 16.07.2011 03:10 4'538'234'846 _52aho.prx > >> 16.07.2011 03:10 61'581'767 _52aho.tii > >> 16.07.2011 03:10 5'505'039'790 _52aho.tis > >> 21.07.2011 15:23 1'999'945 _52aho_6.del > >> 21.07.2011 15:31 5'194'848'138 _7e3hy.fdt > >> 21.07.2011 15:31 28'613'668 _7e3hy.fdx > >> 21.07.2011 15:25 5'002 _7e3hy.fnm > >> 21.07.2011 15:37 1'529'771'296 _7e3hy.frq > >> 21.07.2011 15:37 726'582'244 _7e3hy.prx > >> 21.07.2011 15:37 8'518'198 _7e3hy.tii > >> 21.07.2011 15:37 763'213'144 _7e3hy.tis > >> 18 File(s) 61'840'347'291 bytes > >> > >> as you can see, size on disk did not really change. at this point I > >> optimize down to 1 segment and at the end the index takes 48273 Mb on > >> disk: > >> > >> 21.07.2011 16:46 20 segments.gen > >> 21.07.2011 16:46 278 segments_2acy8 > >> 21.07.2011 13:58 0 write.lock > >> 21.07.2011 16:06 32'901'423'750 _7e3hz.fdt > >> 21.07.2011 16:06 149'582'052 _7e3hz.fdx > >> 21.07.2011 15:42 5'002 _7e3hz.fnm > >> 21.07.2011 16:46 8'608'541'177 _7e3hz.frq > >> 21.07.2011 16:46 4'392'616'115 _7e3hz.prx > >> 21.07.2011 16:46 50'571'856 _7e3hz.tii > >> 21.07.2011 16:46 4'515'914'658 _7e3hz.tis > >> 10 File(s) 50'618'654'908 bytes > >> > >> > >> this means that with the 1 segment optimize I was able to reclaim 10 > >> Gb > > on > >> disk that the 2 segments optimize could not achieve. > >> > >> how can this be explained? is that a normal behavior? > >> > >> thanks, > >> > >> vince > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> Simon Willnauer <simon.willna...@googlemail.com> > >> > >> > >> 20.07.2011 23:11 > >> Please respond to > >> java-user@lucene.apache.org > >> > >> > >> > >> To > >> java-user@lucene.apache.org > >> cc > >> > >> Subject > >> Re: optimize with num segments > 1 index keeps growing > >> > >> > >> > >> > >> > >> > >> On Wed, Jul 20, 2011 at 2:00 PM, <v.se...@lombardodier.com> wrote: > >>> Hi, > >>> > >>> I index several millions small documents per day. each day, I remove > >> some > >>> of the older documents to keep the index at a stable number of > >> documents. > >>> after each purge, I commit then I optimize the index. what I found > >>> is > >> that > >>> if I keep optimizing with max num segments = 2, then the index keeps > >>> growing on the disk. but as soon as I optimize with just 1 segment, > the > >>> space gets reclaimed on the disk. so, I have currently adopted the > >>> following strategy : every night I optimize with 2 segments, except > > once > >>> per week where I optimize with just 1 segment. > >> > >> what do you mean by keeps growing. you have n segments and you > >> optimize down to 2 and the index is bigger than the one with n > >> segments? > >> > >> simon > >>> > >>> is that an expected behavior? > >>> I guess I am doing something special because I was not able to > > reproduce > >>> this behavior in a unit test. what could it be? > >>> > >>> it would be nice to get some explanatory services within the product > to > >>> help get some understanding on its behavior. something that tells > >>> you > >> some > >>> information about your index for instance (number of docs in the > >> different > >>> states, how the space is being used, ...). lucene is a wonderful > >> product, > >>> but to me this is almost like black magic, and when there is a > specific > >>> behavior, I have got little clues to figure out something by myself. > >> some > >>> user oriented logging would be nice as well (the index writer info > >> stream > >>> is really verbose and very low level). > >>> > >>> thanks for your help, > >>> > >>> > >>> Vince > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > ************************ DISCLAIMER > ************************ This > > message is intended only for use by the person to whom it is > > addressed. It may contain information that is privileged and > > confidential. Its content does not constitute a formal commitment by > > Lombard Odier Darier Hentsch & Cie or any of its branches or > > affiliates. > > If you are not the intended recipient of this message, kindly notify > > the sender immediately and destroy this message. Thank You. > > > ********************************************************** > ******* > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > ************************ DISCLAIMER ************************ > This message is intended only for use by the person to whom it is addressed. > It may contain information that is privileged and confidential. Its content > does not constitute a formal commitment by Lombard Odier Darier Hentsch > & Cie or any of its branches or affiliates. > If you are not the intended recipient of this message, kindly notify the > sender immediately and destroy this message. Thank You. > ********************************************************** > ******* --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org