Hello,

We are facing a strange situation in our application as described below:

*Using*:

   - Python 3.8.10
   - Pylucene 6.5.0
   - Java 8 (1.8.0_181)
   - Runs on Linux and Windows (error seen on Windows)

We suddenly get the following *error*:

2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index
(D:\i\202202) writer, Exception:
org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while reading index.
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo")))


After this, no further indexing happens - trying to open the index for
writing throws the above error - and the index writer does not open.

FYI, our code contains the following *settings*:

index_path = "D:\i\202202"
index_directory = FSDirectory.open(Paths.get(index_path))
iconfig = IndexWriterConfig(wrapper_analyzer)
iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND)
iconfig.setRAMBufferSizeMB(16.0)
writer = IndexWriter(index_directory, iconfig)


*Repairing*
We tried 'repairing' the index with the following command / tool:

java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar
org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise

This however returns saying "No problems found with the index."


*Work around*
We have to manually delete the problematic segment file:
D:\i\202202\segments_fo
after which the application starts again... until the next corruption. We
can't spot a specific pattern.


*Two questions:*

   1. Can we handle this situation programmatically, so that no manual
   intervention is needed?
   2. Any reason why we are facing the corruption issue in the first place?


Before this we were using Pylucene 4.10 and we didn't face this problem -
the application logic is the same.

Also, while the application runs on both Linux and Windows, so far we have
observed this situation only on various Windows platforms.

Would really appreciate some assistance. Thanks in advance.

Regards,
Antony

Reply via email to