Which version of Lucene are you using?
More questions/answers below...
[email protected] wrote:
We scan web and index pages in lucene. Our index size is in the
range of
500K to 1 million documens. As we index pages, we also call
IndexWriter.optimize after certain time intervals [I believe Lucene
also
does optimization in the background ?].
Actually Lucene merges segments periodically in the background, but does
not optimize.
So far it has worked great. But for
just this one scan we noticed that the our index size grew to 90 GB
for
about 900K documents [typical index size should be around 17-18GB].
We are
not sure what caused the index to grow this large. Outside of our
system,
when we did a forced IndexWriter.optimize() on this 90 GB lucene
index, it
indeed shrinked to 17 GB. My question is what may have caused the
size to
grow to 90GB?
Optimize requires free temporary disk space equal to 1X the index size.
Do you have an IndexReader open on the index when optimize runs? That
ties up another 1X.
That should mean a 17-18GB index takes 51-54 GB, so I'm not sure why
you got up to 90 GB. There we no exceptions, even in BG merge threads?
Are you reopening readers while optimize is running? In theory that
could
tie up even more disk space (eg if you didn't close the old readers).
Did the size grow because optimization failed ?
If optimization fails it would remove the partially written files, so
I don't think
this would explain too-high disk usage.
Does
optimization fail if there is any foreign file in the lucene index
directory
[though we tried optimizing with foreign files in lucene directory,
and
lucene still did optimize the index.]
Foreign files are harmless as long as they don't conflict w/ Lucene's
file names.
Mike