Live index upgrading

David Allouche Mon, 17 Jun 2019 08:41:59 -0700

Hello,

I use Lucene with PyLucene on a public-facing web application. We have a 
moderately large index (~24M documents, ~11GB index data), with a constant 
stream of new documents.

I recently upgraded to PyLucene 7.

When trying to test the new release of PyLucene 8, I encountered an
IndexFormatTooOld error because my index conversion from Lucene6 to Lucene7 was
not complete.

I found IndexUpgrader, and I had a look at its implementation. I would very
much like to avoid putting down the service during the index upgrade, so I
believe I cannot use IndexUpgrader because I need the write lock to be held by
the web application to index new documents.

So I figure I could get the desired result with an IndexWriter.forceMerge(1).
But the documentation says "This is a horribly costly operation, especially
when you pass a small maxNumSegments; usually you should only call this if the
index is static (will no longer be changed)."
https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge-int-

And indeed, forceMerge tends be killed the kernel OOM killer on my development
VM. I want to avoid this failure mode in production. I could increase the VM
until it works, but I would rather have a less brutal approach to upgrading a
live index. Something that could run in the background with reasonable amounts
of anonymous memory.

What is the recommended approach to upgrading a live index?

How can I know from the code that the index needs upgrading at all? I could add
a manual knob to start an upgrade, but it would be better if it occurred
transparently when I upgrade PyLucene.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Live index upgrading

Reply via email to