Let’s back up a bit. What version of Lucene are you using? Starting with Lucene 8, any index that’s ever been touched by Lucene 6 will not open. It does not matter if the index has been completely rewritten. It does not matter if it’s been run through IndexUpgraderTool, which just does a forceMerge to 1 segment. A marker is preserved when a segment is created, and the earliest one is preserved across merges. So say you have two segments, one created with 6 and one with 7. The Lucene 6 marker is preserved when they are merged.
Now, if any segment has the Lucene 6 marker, the index will not be opened by Lucene. If you’re using Lucene 7, then this error implies that one or more of your segments was created with Lucene 5 or earlier. So you probably need to re-index from scratch on whatever version of Lucene you want to use. Best, Erick > On Jun 17, 2019, at 8:41 AM, David Allouche <da...@allouche.net> wrote: > > Hello, > > I use Lucene with PyLucene on a public-facing web application. We have a > moderately large index (~24M documents, ~11GB index data), with a constant > stream of new documents. > > I recently upgraded to PyLucene 7. > > When trying to test the new release of PyLucene 8, I encountered an > IndexFormatTooOld error because my index conversion from Lucene6 to Lucene7 > was not complete. > > I found IndexUpgrader, and I had a look at its implementation. I would very > much like to avoid putting down the service during the index upgrade, so I > believe I cannot use IndexUpgrader because I need the write lock to be held > by the web application to index new documents. > > So I figure I could get the desired result with an IndexWriter.forceMerge(1). > But the documentation says "This is a horribly costly operation, especially > when you pass a small maxNumSegments; usually you should only call this if > the index is static (will no longer be changed)." > https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge-int- > > And indeed, forceMerge tends be killed the kernel OOM killer on my > development VM. I want to avoid this failure mode in production. I could > increase the VM until it works, but I would rather have a less brutal > approach to upgrading a live index. Something that could run in the > background with reasonable amounts of anonymous memory. > > What is the recommended approach to upgrading a live index? > > How can I know from the code that the index needs upgrading at all? I could > add a manual knob to start an upgrade, but it would be better if it occurred > transparently when I upgrade PyLucene. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org