Digy,

It will be difficult to create group of indexes because of the way we build and 
search the index. We keep on adding new documents and also keep on updating 
existing documents quite frequently. Also our searches need to be fired on the 
entire set.

We are not facing any search performance problems as of now, I just wanted to 
check if there are any known performance or scalability issues after crossing 
100 GB size. Another question on same topic. I am not sure if 100 GB size of 
our index is genuine or it is due to some failures which has resulted into 
redundant segments/files. I saw few TMP files which I have deleted. But apart 
from that, I am not sure how to identify redundant or junk files in Lucene 
index folder.

Following is the list of files which we have in lucene index folder:

\\LuceneIndexTest\_d8by.prx
\\LuceneIndexTest\_d8by.tii
\\LuceneIndexTest\_d8by.tis
\\LuceneIndexTest\_d8c9.fdt
\\LuceneIndexTest\_d8c9.fdx
\\LuceneIndexTest\_d8c9.fnm
\\LuceneIndexTest\_d8ca.fdt
\\LuceneIndexTest\_d8ca.fdx
\\LuceneIndexTest\_d8ca.fnm
\\LuceneIndexTest\_dl4h.fdt
\\LuceneIndexTest\_dl4h.fdx
\\LuceneIndexTest\_dl4h.fnm
\\LuceneIndexTest\_dl48.fdt
\\LuceneIndexTest\_dl48.fdx
\\LuceneIndexTest\_dl48.fnm
\\LuceneIndexTest\_dl48.frq
\\LuceneIndexTest\_dl48.prx
\\LuceneIndexTest\_dl48.tii
\\LuceneIndexTest\_dl48.tis
\\LuceneIndexTest\_fdbs.fdt
\\LuceneIndexTest\_fdbs.fdx
\\LuceneIndexTest\_fdbs.fnm
\\LuceneIndexTest\_fdbs.frq
\\LuceneIndexTest\_fdbs.prx
\\LuceneIndexTest\_fdbs.tii
\\LuceneIndexTest\_fdbs.tis
\\LuceneIndexTest\_fhz5.fdt
\\LuceneIndexTest\_fhz5.fdx
\\LuceneIndexTest\_fhz5.fnm
\\LuceneIndexTest\_fhz5.frq
\\LuceneIndexTest\_fhz5.prx
\\LuceneIndexTest\_fhz5.tii
\\LuceneIndexTest\_fhz5.tis
\\LuceneIndexTest\_fkla.fdt
\\LuceneIndexTest\_fkla.fdx
\\LuceneIndexTest\_fkla.fnm
\\LuceneIndexTest\_fkla.frq
\\LuceneIndexTest\_fkla.prx
\\LuceneIndexTest\_fkla.tii
\\LuceneIndexTest\_fkla.tis
\\LuceneIndexTest\_fmo5.fdt
\\LuceneIndexTest\_fmo5.fdx
\\LuceneIndexTest\_fmo5.fnm
\\LuceneIndexTest\_fmo5.frq
\\LuceneIndexTest\_fmo5.prx
\\LuceneIndexTest\_fmo5.tii
\\LuceneIndexTest\_fmo5.tis
\\LuceneIndexTest\_fmo6.fdt
\\LuceneIndexTest\_fmo6.fdx
\\LuceneIndexTest\_fmo6.fnm
\\LuceneIndexTest\_fmo6.frq
\\LuceneIndexTest\_fmo6.prx
\\LuceneIndexTest\_fmo6.tii
\\LuceneIndexTest\_fmo6.tis
\\LuceneIndexTest\_fmo7.fdt
\\LuceneIndexTest\_fmo7.fdx
\\LuceneIndexTest\_fmo7.fnm
\\LuceneIndexTest\_fmo7.frq
\\LuceneIndexTest\_fmo7.prx
\\LuceneIndexTest\_fmo7.tii
\\LuceneIndexTest\_fmo7.tis
\\LuceneIndexTest\_fmo9.fdt
\\LuceneIndexTest\_fmo9.fdx
\\LuceneIndexTest\_fmo9.fnm
\\LuceneIndexTest\_fmoa.fdt
\\LuceneIndexTest\_fmoa.fdx
\\LuceneIndexTest\_fmoa.fnm
\\LuceneIndexTest\_fmod.fdt
\\LuceneIndexTest\_fmod.fdx
\\LuceneIndexTest\_fmod.fnm
\\LuceneIndexTest\_fmoe.fdt
\\LuceneIndexTest\_fmoe.fdx
\\LuceneIndexTest\_fmoe.fnm
\\LuceneIndexTest\_fmof.fdt
\\LuceneIndexTest\_fmof.fdx
\\LuceneIndexTest\_fmof.fnm
\\LuceneIndexTest\_fmog.fdt
\\LuceneIndexTest\_fmog.fdx
\\LuceneIndexTest\_fmog.fnm
\\LuceneIndexTest\_fmoh.fdt
\\LuceneIndexTest\_fmoh.fdx
\\LuceneIndexTest\_fmoh.fnm
\\LuceneIndexTest\_foq9.fdt
\\LuceneIndexTest\_foq9.fdx
\\LuceneIndexTest\_foq9.fnm
\\LuceneIndexTest\_foq9.frq
\\LuceneIndexTest\_foq9.prx
\\LuceneIndexTest\_foq9.tii
\\LuceneIndexTest\_foq9.tis
\\LuceneIndexTest\_fq23.fdt
\\LuceneIndexTest\_fq23.fdx
\\LuceneIndexTest\_fq23.fnm
\\LuceneIndexTest\_fq23.frq
\\LuceneIndexTest\_fq23.prx
\\LuceneIndexTest\_fq23.tii
\\LuceneIndexTest\_fq23.tis
\\LuceneIndexTest\_hr8w.fdt
\\LuceneIndexTest\_hr8w.fdx
\\LuceneIndexTest\_hr8w.fnm
\\LuceneIndexTest\_hr8x.fdt
\\LuceneIndexTest\_hr8x.fdx
\\LuceneIndexTest\_hr8x.fnm
\\LuceneIndexTest\_k6jf.cfs
\\LuceneIndexTest\_kwhl.cfs
\\LuceneIndexTest\deletable
\\LuceneIndexTest\segments
\\LuceneIndexTest\_d8by.fdt
\\LuceneIndexTest\_d8by.fdx
\\LuceneIndexTest\_d8by.fnm
\\LuceneIndexTest\_d8by.frq

Any inputs on junk/redundant files in above list?



-----Original Message-----
From: Digy [mailto:digyd...@gmail.com]
Sent: Tuesday, December 30, 2008 2:37 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi Nitin,

* I haven't heard about that 100GB limit but I tried Lucene.Net once with a
300GB index. The first searches (with a fresh IndexSearcher) took
~20sec(because of caching) but next searches performed quite well(varying
from ~50msec to 3sec).

* If you deal with such large indexes, it is better to group the indexes
according to some criteria(for ex., index of December, index of November
etc.) and not to use an index when it is not needed in the search. Of
course, keeping smaller indexes on multiple machines and making a parallel
search on them and then merging the results would be a good solution too,
but it would require more complex coding

You may also want to see some tricks about search speed optimizations (
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed ) and the
project Solr ( http://lucene.apache.org/solr/features.html ).

* You can get the official releases of Lucene.Net from
https://svn.apache.org/repos/asf/incubator/lucene.net/site/download and the
current version from svn trunk
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/src/Lucene.
Net/



DIGY.







-----Original Message-----
From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
Sent: Saturday, December 27, 2008 6:41 AM
To: lucene-net-user@incubator.apache.org
Subject: Lucene Scalability Options

Hi All,

We are using Lucene.NET v2.0 library in our project. Our index has grown to
~80 GB in last one year. We expect our index to grow beyond 100 GB in next
six months. I have read somewhere long back about Lucene performance issues
after crossing 100 GB mark.


-          Is there any specific issues that we might run into after 100 GB?

-          Is there any known impact on search performance?

-          Do we have any scalability features that we can consider for
implementation? Clustering etc?

Any inputs would be valuable. Also I would like to know the latest stable
Lucene.NET release which we can migrate to, any download link would be
useful.


Thanks & regards,

Nitin Shiralkar

Reply via email to