Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows platform, is that an IndexReader is open during the whole index process. During indexing, the writer creates temporary segment files which will be merged into bigger segments. If done, the old segment files will be deleted. If there is an open IndexReader, the environment is unable to unlock the files and they still stay in the index directory. You will end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

Hello
I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes. The size of the documents I have indexed is around 2.5 GB. The size of the Lucene indexes is around 287 GB. Does this seem correct? I am not storing the contents of the file, just indexing and tokenizing. I am using Lucene 1.3 final. Can you guys let me know what you are experiencing? I don't want to go into production with something that I should be configuring better.


I am not sure if this helps, but I have a temp index and a real index. I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter. I have also set the production writer setUseCompoundFile to true. I did not set this on the temp index. The last thing that I do before closing the production writer is to call the optimize method.

I would really appreciate any ideas to get the index size smaller if it is at all 
possible.

Thanks
Rob




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to