When we were doing this, optimize (which we called every night) didn't
fix it - our problem was mid-merge.

The problem was, we were mid-optimize, and the process was killed - not
cleanly. So we had files left over, and no record of them in segments.
We DID have valid stuff in segments, just too many segment files, so it
tried to merge again.

Anyway, changed some settings - after taking a while to work out what it
was - and it went away :) moral of the story: don't have a watchdog
timer which kills the process if you merge 2GB segments!

-----Original Message-----
From: George Aroush [mailto:geo...@aroush.net] 
Sent: 13 January 2009 13:24
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

There is.  Call the Optimize() function on the index.

You should never delete index files manually unless if you know what you
are
doing otherwise you can corrupt / destroy your index.

-- George 

> -----Original Message-----
> From: Nic Wise [mailto:nic.w...@bbc.com] 
> Sent: Tuesday, January 13, 2009 6:36 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> I'm SURE there is a cleaner way, but in the past, we read the 
> segments file (manually :( ), and any file which wasn't 
> listed in there was considered to be a redundant file.
> 
> Worked for us. There may be a way to ask a IndexReader which 
> files it's using, and then extrapolate from there, but we 
> were using Lucene.net 1.something, which didn't.
> 
> I think that's what luke does. Opens the index, asks Lucene 
> whats it's using, kills everything else.
> 
> -----Original Message-----
> From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> Sent: 13 January 2009 11:26
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi All,
> 
> I have started this thread for Lucene scalability aspect. I 
> have an index with 80 GB size. However it looks like many of 
> the segment files are either redundant or unused. Even if I 
> delete them and just retain CFS, segments and deletable 
> files, the index seems to be working fine.
> However I want to know more cleaner approach to identify such 
> redundant/unused files through APIs. I am able to see these 
> unused files in Luke as "Deletable". However I am not sure 
> how Luke is able to identify unused files. I am using 
> Lucene.NET 2.0 version.
> 
> Can you please suggest some way?
> 
> 
> 
> -----Original Message-----
> From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
> Sent: Tuesday, January 13, 2009 1:01 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> 
> Floyd, you will need to provide more details about the 
> specific problems you are encountering.
> 
> I made a quick check, and have no difficulty opening and 
> inspecting an index I created a few minutes ago with 
> Lucene.NET v2.3.1 using Luke v0.9.1.
> 
> -- Neal
> 
> 
> -----Original Message-----
> From: Floyd Wu [mailto:floyd...@gmail.com]
> Sent: Friday, January 09, 2009 8:18 PM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Lucene Scalability Options
> 
> Hi all,
> It seems new version of Luke is not compitable with 
> Lucene.net and I've email to the creator of Luke. Below is 
> feedback from him
> 
> "Yes, there have been many changes,
> but Lucene 2.4 can still open indexes built with earlier 
> versions of Lucene/Java.
> This is the second report I've got about the possible 
> incompatibility with Lucene.Net - I suggest to raise up this 
> issue on the Lucene mailing list ( 
> java-...@lucene.apache.org), and provide more details, eg. 
> Lucene.Net revision, stack trace, a small sample index if you can."
> 
> My original report as below
> "The situation is Luke-0.9 can not open the index files which 
> built by Lucene.Net-2.3.1.
> I tried to use older version of Luke and confirm Luke-0.8 and 
> Luke-0.8.1 can open and read index files fine.
>  I wonder if there is any change between java Lucene 2.3 and 2.4.
> Please help on this."
> 
> Floyd
> 
> 
> 
> 2009/1/9 George Aroush <geo...@aroush.net>
> 
> > Hi Nitin,
> >
> > Any optimization that Luke can do on an index is also 
> doable by making
> API
> > calls from Lucene.Net.  If not, then there is either a bug in
> Lucene.Net or
> > in your use of the API.  Can you share with us your API 
> calls as well
> as
> > the
> > Lucene.Net version you are using?
> >
> > Thanks.
> >
> > -- George
> >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> >  > Sent: Friday, January 09, 2009 6:27 AM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > Surprisingly, it has brought down the index size to ~20 
> GB with only 
> > > one CFS and segment files left behind. I used compound 
> optimization 
> > > option. But I use the similar "SetUseCompoundFile" property on 
> > > "IndexModifier" object in my Lucene.NET code, but it has 
> no effect 
> > > on size or files after optimization. Any suggestions??
> > >
> > >
> > > -----Original Message-----
> > > From: Hugh Spiller [mailto:hugh.spil...@renishaw.com]
> > > Sent: Friday, January 09, 2009 3:35 PM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Hi Nitin,
> > >
> > > I've found the easiest way to get rid of redundant files 
> in an index 
> > > is to use Luke. As soon as you use it to open the index, 
> it tidies 
> > > up all the cruft.
> > >
> > > It's at http://www.getopt.org/luke/ .
> > >
> > > ________________________________
> > >
> > > Hugh Spiller
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> > > Sent: 09 January 2009 08:48
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > -- snip --
> > >
> > >
> > > Any inputs on junk/redundant files in above list?
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ------------------------------------
> > > This email and any attachments are confidential and are 
> for the use 
> > > of the addressee only. If you are not the addressee, you must not 
> > > use or disclose the contents to any other person. Please 
> immediately 
> > > notify the sender and delete the email. Statements and opinions 
> > > expressed here may not represent those of the company. Email 
> > > correspondence is monitored by the company. This 
> information may be 
> > > subject to Export Control Regulation. You are obliged to 
> comply with 
> > > such Regulations
> > >
> > > The parent company of the Renishaw Group is Renishaw plc, 
> registered 
> > > in England no. 1106260. Registered Office: New Mills, 
> > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel 
> > > +44 (0) 1453 524524
> > > --------------------------------------------------------------
> > > ------------------------------------
> > >
> >
> > 
> This e-mail (and any attachments) is confidential and may 
> contain personal views which are not the views of the BBC 
> unless specifically stated. If you have received it in error, 
> please delete it from your system. Do not use, copy or 
> disclose the information in any way nor act in reliance on it 
> and notify the sender immediately.
>  
> Please note that the BBC monitors e-mails sent or received. 
> Further communication will signify your consent to this
> 
> This e-mail has been sent by one of the following 
> wholly-owned subsidiaries of the BBC:
>  
> BBC Worldwide Limited, Registration Number: 1420028 England, 
> Registered Address: BBC Media Centre, 201 Wood Lane, London, 
> W12 7TQ BBC World News Limited, Registration Number: 04514407 
> England, Registered Address: Woodlands, BBC Media Centre, 201 
> Wood Lane, London, W12 7TQ BBC World Distribution Limited, 
> Registration Number: 04514408, Registered Address: Woodlands, 
> BBC Media Centre, 201 Wood Lane, London, W12 7TQ
> 
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated. If you 
have received it in error, please delete it from your system. Do not use, copy 
or disclose the information in any way nor act in reliance on it and notify the 
sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further 
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of 
the BBC:
 
BBC Worldwide Limited, Registration Number: 1420028 England, Registered 
Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World News Limited, Registration Number: 04514407 England, Registered 
Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World Distribution Limited, Registration Number: 04514408, Registered 
Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ

Reply via email to