From: Doug Cutting http://www.mail-archive.com/[EMAIL PROTECTED]/msg08757.html
> An index typically requires around 35% of the plain text size. I think it's a little big. sv On Wed, 18 Aug 2004, Rob Jose wrote: > Hello > I have indexed several thousand (52 to be exact) text files and I keep > running out of disk space to store the indexes. The size of the > documents I have indexed is around 2.5 GB. The size of the Lucene > indexes is around 287 GB. Does this seem correct? I am not storing the > contents of the file, just indexing and tokenizing. I am using Lucene > 1.3 final. Can you guys let me know what you are experiencing? I don't > want to go into production with something that I should be configuring > better. > > I am not sure if this helps, but I have a temp index and a real index. I index the > file into the temp index, and then merge the temp index into the real index using > the addIndexes method on the IndexWriter. I have also set the production writer > setUseCompoundFile to true. I did not set this on the temp index. The last thing > that I do before closing the production writer is to call the optimize method. > > I would really appreciate any ideas to get the index size smaller if it is at all > possible. > > Thanks > Rob --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
