Hello,

I use Lucene with PyLucene on a public-facing web application. We have a 
moderately large index (~24M documents, ~11GB index data), with a constant 
stream of new documents.

I recently upgraded to PyLucene 7.

When trying to test the new release of PyLucene 8, I encountered an 
IndexFormatTooOld error because my index conversion from Lucene6 to Lucene7 was 
not complete.

I found IndexUpgrader, and I had a look at its implementation. I would very 
much like to avoid putting down the service during the index upgrade, so I 
believe I cannot use IndexUpgrader because I need the write lock to be held by 
the web application to index new documents.

So I figure I could get the desired result with an IndexWriter.forceMerge(1). 
But the documentation says "This is a horribly costly operation, especially 
when you pass a small maxNumSegments; usually you should only call this if the 
index is static (will no longer be changed)." 
https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/index/IndexWriter.html#forceMerge-int-

And indeed, forceMerge tends be killed the kernel OOM killer on my development 
VM. I want to avoid this failure mode in production. I could increase the VM 
until it works, but I would rather have a less brutal approach to upgrading a 
live index. Something that could run in the background with reasonable amounts 
of anonymous memory.

What is the recommended approach to upgrading a live index?

How can I know from the code that the index needs upgrading at all? I could add 
a manual knob to start an upgrade, but it would be better if it occurred 
transparently when I upgrade PyLucene.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to