Hello I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes. The size of the documents I have indexed is around 2.5 GB. The size of the Lucene indexes is around 287 GB. Does this seem correct? I am not storing the contents of the file, just indexing and tokenizing. I am using Lucene 1.3 final. Can you guys let me know what you are experiencing? I don't want to go into production with something that I should be configuring better.
I am not sure if this helps, but I have a temp index and a real index. I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter. I have also set the production writer setUseCompoundFile to true. I did not set this on the temp index. The last thing that I do before closing the production writer is to call the optimize method. I would really appreciate any ideas to get the index size smaller if it is at all possible. Thanks Rob
