Hi George, Thanks. But the basic reason for junk files is optimize only. When you set compound index flag to true to have single segment file, then lucene tries to merge all segments and deletes the older ones. However if the older ones are being accessed in parallel, then delete operation fails. This is tracked in lucene through "deletable" and should be cleaned when we open index next time. However in some cases the files remain as unused and no longer referenced in lucene.
This is a rare scenario and files are created over a period of two years. -----Original Message----- From: George Aroush [mailto:geo...@aroush.net] Sent: Tuesday, January 13, 2009 6:54 PM To: lucene-net-user@incubator.apache.org Subject: RE: Lucene Scalability Options There is. Call the Optimize() function on the index. You should never delete index files manually unless if you know what you are doing otherwise you can corrupt / destroy your index. -- George > -----Original Message----- > From: Nic Wise [mailto:nic.w...@bbc.com] > Sent: Tuesday, January 13, 2009 6:36 AM > To: lucene-net-user@incubator.apache.org > Subject: RE: Lucene Scalability Options > > I'm SURE there is a cleaner way, but in the past, we read the > segments file (manually :( ), and any file which wasn't > listed in there was considered to be a redundant file. > > Worked for us. There may be a way to ask a IndexReader which > files it's using, and then extrapolate from there, but we > were using Lucene.net 1.something, which didn't. > > I think that's what luke does. Opens the index, asks Lucene > whats it's using, kills everything else. > > -----Original Message----- > From: Nitin Shiralkar [mailto:nit...@coreobjects.com] > Sent: 13 January 2009 11:26 > To: lucene-net-user@incubator.apache.org > Subject: RE: Lucene Scalability Options > > Hi All, > > I have started this thread for Lucene scalability aspect. I > have an index with 80 GB size. However it looks like many of > the segment files are either redundant or unused. Even if I > delete them and just retain CFS, segments and deletable > files, the index seems to be working fine. > However I want to know more cleaner approach to identify such > redundant/unused files through APIs. I am able to see these > unused files in Luke as "Deletable". However I am not sure > how Luke is able to identify unused files. I am using > Lucene.NET 2.0 version. > > Can you please suggest some way? > > > > -----Original Message----- > From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] > Sent: Tuesday, January 13, 2009 1:01 AM > To: lucene-net-user@incubator.apache.org > Subject: RE: Lucene Scalability Options > > > Floyd, you will need to provide more details about the > specific problems you are encountering. > > I made a quick check, and have no difficulty opening and > inspecting an index I created a few minutes ago with > Lucene.NET v2.3.1 using Luke v0.9.1. > > -- Neal > > > -----Original Message----- > From: Floyd Wu [mailto:floyd...@gmail.com] > Sent: Friday, January 09, 2009 8:18 PM > To: lucene-net-user@incubator.apache.org > Subject: Re: Lucene Scalability Options > > Hi all, > It seems new version of Luke is not compitable with > Lucene.net and I've email to the creator of Luke. Below is > feedback from him > > "Yes, there have been many changes, > but Lucene 2.4 can still open indexes built with earlier > versions of Lucene/Java. > This is the second report I've got about the possible > incompatibility with Lucene.Net - I suggest to raise up this > issue on the Lucene mailing list ( > java-...@lucene.apache.org), and provide more details, eg. > Lucene.Net revision, stack trace, a small sample index if you can." > > My original report as below > "The situation is Luke-0.9 can not open the index files which > built by Lucene.Net-2.3.1. > I tried to use older version of Luke and confirm Luke-0.8 and > Luke-0.8.1 can open and read index files fine. > I wonder if there is any change between java Lucene 2.3 and 2.4. > Please help on this." > > Floyd > > > > 2009/1/9 George Aroush <geo...@aroush.net> > > > Hi Nitin, > > > > Any optimization that Luke can do on an index is also > doable by making > API > > calls from Lucene.Net. If not, then there is either a bug in > Lucene.Net or > > in your use of the API. Can you share with us your API > calls as well > as > > the > > Lucene.Net version you are using? > > > > Thanks. > > > > -- George > > > > > -----Original Message----- > > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com] > > > Sent: Friday, January 09, 2009 6:27 AM > > > To: lucene-net-user@incubator.apache.org > > > Subject: RE: Lucene Scalability Options > > > > > > Thanks Hugh. Yes, I tried using Luke for index optimization. > > > Surprisingly, it has brought down the index size to ~20 > GB with only > > > one CFS and segment files left behind. I used compound > optimization > > > option. But I use the similar "SetUseCompoundFile" property on > > > "IndexModifier" object in my Lucene.NET code, but it has > no effect > > > on size or files after optimization. Any suggestions?? > > > > > > > > > -----Original Message----- > > > From: Hugh Spiller [mailto:hugh.spil...@renishaw.com] > > > Sent: Friday, January 09, 2009 3:35 PM > > > To: lucene-net-user@incubator.apache.org > > > Subject: RE: Lucene Scalability Options > > > > > > Hi Nitin, > > > > > > I've found the easiest way to get rid of redundant files > in an index > > > is to use Luke. As soon as you use it to open the index, > it tidies > > > up all the cruft. > > > > > > It's at http://www.getopt.org/luke/ . > > > > > > ________________________________ > > > > > > Hugh Spiller > > > > > > > > > -----Original Message----- > > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com] > > > Sent: 09 January 2009 08:48 > > > To: lucene-net-user@incubator.apache.org > > > Subject: RE: Lucene Scalability Options > > > > > > -- snip -- > > > > > > > > > Any inputs on junk/redundant files in above list? > > > > > > > > > > > > -------------------------------------------------------------- > > > ------------------------------------ > > > This email and any attachments are confidential and are > for the use > > > of the addressee only. If you are not the addressee, you must not > > > use or disclose the contents to any other person. Please > immediately > > > notify the sender and delete the email. Statements and opinions > > > expressed here may not represent those of the company. Email > > > correspondence is monitored by the company. This > information may be > > > subject to Export Control Regulation. You are obliged to > comply with > > > such Regulations > > > > > > The parent company of the Renishaw Group is Renishaw plc, > registered > > > in England no. 1106260. Registered Office: New Mills, > > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel > > > +44 (0) 1453 524524 > > > -------------------------------------------------------------- > > > ------------------------------------ > > > > > > > > This e-mail (and any attachments) is confidential and may > contain personal views which are not the views of the BBC > unless specifically stated. If you have received it in error, > please delete it from your system. Do not use, copy or > disclose the information in any way nor act in reliance on it > and notify the sender immediately. > > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this > > This e-mail has been sent by one of the following > wholly-owned subsidiaries of the BBC: > > BBC Worldwide Limited, Registration Number: 1420028 England, > Registered Address: BBC Media Centre, 201 Wood Lane, London, > W12 7TQ BBC World News Limited, Registration Number: 04514407 > England, Registered Address: Woodlands, BBC Media Centre, 201 > Wood Lane, London, W12 7TQ BBC World Distribution Limited, > Registration Number: 04514408, Registered Address: Woodlands, > BBC Media Centre, 201 Wood Lane, London, W12 7TQ >