Digy, It will be difficult to create group of indexes because of the way we build and search the index. We keep on adding new documents and also keep on updating existing documents quite frequently. Also our searches need to be fired on the entire set.
We are not facing any search performance problems as of now, I just wanted to check if there are any known performance or scalability issues after crossing 100 GB size. Another question on same topic. I am not sure if 100 GB size of our index is genuine or it is due to some failures which has resulted into redundant segments/files. I saw few TMP files which I have deleted. But apart from that, I am not sure how to identify redundant or junk files in Lucene index folder. Following is the list of files which we have in lucene index folder: \\LuceneIndexTest\_d8by.prx \\LuceneIndexTest\_d8by.tii \\LuceneIndexTest\_d8by.tis \\LuceneIndexTest\_d8c9.fdt \\LuceneIndexTest\_d8c9.fdx \\LuceneIndexTest\_d8c9.fnm \\LuceneIndexTest\_d8ca.fdt \\LuceneIndexTest\_d8ca.fdx \\LuceneIndexTest\_d8ca.fnm \\LuceneIndexTest\_dl4h.fdt \\LuceneIndexTest\_dl4h.fdx \\LuceneIndexTest\_dl4h.fnm \\LuceneIndexTest\_dl48.fdt \\LuceneIndexTest\_dl48.fdx \\LuceneIndexTest\_dl48.fnm \\LuceneIndexTest\_dl48.frq \\LuceneIndexTest\_dl48.prx \\LuceneIndexTest\_dl48.tii \\LuceneIndexTest\_dl48.tis \\LuceneIndexTest\_fdbs.fdt \\LuceneIndexTest\_fdbs.fdx \\LuceneIndexTest\_fdbs.fnm \\LuceneIndexTest\_fdbs.frq \\LuceneIndexTest\_fdbs.prx \\LuceneIndexTest\_fdbs.tii \\LuceneIndexTest\_fdbs.tis \\LuceneIndexTest\_fhz5.fdt \\LuceneIndexTest\_fhz5.fdx \\LuceneIndexTest\_fhz5.fnm \\LuceneIndexTest\_fhz5.frq \\LuceneIndexTest\_fhz5.prx \\LuceneIndexTest\_fhz5.tii \\LuceneIndexTest\_fhz5.tis \\LuceneIndexTest\_fkla.fdt \\LuceneIndexTest\_fkla.fdx \\LuceneIndexTest\_fkla.fnm \\LuceneIndexTest\_fkla.frq \\LuceneIndexTest\_fkla.prx \\LuceneIndexTest\_fkla.tii \\LuceneIndexTest\_fkla.tis \\LuceneIndexTest\_fmo5.fdt \\LuceneIndexTest\_fmo5.fdx \\LuceneIndexTest\_fmo5.fnm \\LuceneIndexTest\_fmo5.frq \\LuceneIndexTest\_fmo5.prx \\LuceneIndexTest\_fmo5.tii \\LuceneIndexTest\_fmo5.tis \\LuceneIndexTest\_fmo6.fdt \\LuceneIndexTest\_fmo6.fdx \\LuceneIndexTest\_fmo6.fnm \\LuceneIndexTest\_fmo6.frq \\LuceneIndexTest\_fmo6.prx \\LuceneIndexTest\_fmo6.tii \\LuceneIndexTest\_fmo6.tis \\LuceneIndexTest\_fmo7.fdt \\LuceneIndexTest\_fmo7.fdx \\LuceneIndexTest\_fmo7.fnm \\LuceneIndexTest\_fmo7.frq \\LuceneIndexTest\_fmo7.prx \\LuceneIndexTest\_fmo7.tii \\LuceneIndexTest\_fmo7.tis \\LuceneIndexTest\_fmo9.fdt \\LuceneIndexTest\_fmo9.fdx \\LuceneIndexTest\_fmo9.fnm \\LuceneIndexTest\_fmoa.fdt \\LuceneIndexTest\_fmoa.fdx \\LuceneIndexTest\_fmoa.fnm \\LuceneIndexTest\_fmod.fdt \\LuceneIndexTest\_fmod.fdx \\LuceneIndexTest\_fmod.fnm \\LuceneIndexTest\_fmoe.fdt \\LuceneIndexTest\_fmoe.fdx \\LuceneIndexTest\_fmoe.fnm \\LuceneIndexTest\_fmof.fdt \\LuceneIndexTest\_fmof.fdx \\LuceneIndexTest\_fmof.fnm \\LuceneIndexTest\_fmog.fdt \\LuceneIndexTest\_fmog.fdx \\LuceneIndexTest\_fmog.fnm \\LuceneIndexTest\_fmoh.fdt \\LuceneIndexTest\_fmoh.fdx \\LuceneIndexTest\_fmoh.fnm \\LuceneIndexTest\_foq9.fdt \\LuceneIndexTest\_foq9.fdx \\LuceneIndexTest\_foq9.fnm \\LuceneIndexTest\_foq9.frq \\LuceneIndexTest\_foq9.prx \\LuceneIndexTest\_foq9.tii \\LuceneIndexTest\_foq9.tis \\LuceneIndexTest\_fq23.fdt \\LuceneIndexTest\_fq23.fdx \\LuceneIndexTest\_fq23.fnm \\LuceneIndexTest\_fq23.frq \\LuceneIndexTest\_fq23.prx \\LuceneIndexTest\_fq23.tii \\LuceneIndexTest\_fq23.tis \\LuceneIndexTest\_hr8w.fdt \\LuceneIndexTest\_hr8w.fdx \\LuceneIndexTest\_hr8w.fnm \\LuceneIndexTest\_hr8x.fdt \\LuceneIndexTest\_hr8x.fdx \\LuceneIndexTest\_hr8x.fnm \\LuceneIndexTest\_k6jf.cfs \\LuceneIndexTest\_kwhl.cfs \\LuceneIndexTest\deletable \\LuceneIndexTest\segments \\LuceneIndexTest\_d8by.fdt \\LuceneIndexTest\_d8by.fdx \\LuceneIndexTest\_d8by.fnm \\LuceneIndexTest\_d8by.frq Any inputs on junk/redundant files in above list? -----Original Message----- From: Digy [mailto:digyd...@gmail.com] Sent: Tuesday, December 30, 2008 2:37 AM To: lucene-net-user@incubator.apache.org Subject: RE: Lucene Scalability Options Hi Nitin, * I haven't heard about that 100GB limit but I tried Lucene.Net once with a 300GB index. The first searches (with a fresh IndexSearcher) took ~20sec(because of caching) but next searches performed quite well(varying from ~50msec to 3sec). * If you deal with such large indexes, it is better to group the indexes according to some criteria(for ex., index of December, index of November etc.) and not to use an index when it is not needed in the search. Of course, keeping smaller indexes on multiple machines and making a parallel search on them and then merging the results would be a good solution too, but it would require more complex coding You may also want to see some tricks about search speed optimizations ( http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed ) and the project Solr ( http://lucene.apache.org/solr/features.html ). * You can get the official releases of Lucene.Net from https://svn.apache.org/repos/asf/incubator/lucene.net/site/download and the current version from svn trunk https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/src/Lucene. Net/ DIGY. -----Original Message----- From: Nitin Shiralkar [mailto:nit...@coreobjects.com] Sent: Saturday, December 27, 2008 6:41 AM To: lucene-net-user@incubator.apache.org Subject: Lucene Scalability Options Hi All, We are using Lucene.NET v2.0 library in our project. Our index has grown to ~80 GB in last one year. We expect our index to grow beyond 100 GB in next six months. I have read somewhere long back about Lucene performance issues after crossing 100 GB mark. - Is there any specific issues that we might run into after 100 GB? - Is there any known impact on search performance? - Do we have any scalability features that we can consider for implementation? Clustering etc? Any inputs would be valuable. Also I would like to know the latest stable Lucene.NET release which we can migrate to, any download link would be useful. Thanks & regards, Nitin Shiralkar