Segments file disappears, index no longer functions.

adakos Wed, 21 Jan 2009 01:44:35 -0800

Hello everyone!

I have implemented Lucene .Net, and have had it functioning very well for
quite some time.

I have an indexing server application that indexes a very large file system,
and a searching application that the users use to search the index created
by the server application.

Our index is currently ~4 GB s in size, and we have roughly 1/2 million
documents that we are indexing/updating regularly.

As of late, we have been having a problem with the segments file
disappearing.

I originally thought it was the indexing server crashing or encountering
some kind of error, but this wasn't the case, I ran the program in debug
mode and found that the indexing server itself didn't fault at any point,
and the indexing program itself ran into the problem of not being able to
find the segment file as well.

Unfortunately every time I run the Indexing program, this problem occurs.
So as a result of running the indexer we encounter the issue with the
segments file being deleted or disappearing, so the indexer is causing the
issue, but there doesn't appear to be any reason why.

I have optimised the index and ran the program again and that still doesn't
help.

All the index writers/readers have appropriately coded .Close() methods in a
try/catch/finally.

Like I said, the indexer was running perfectly fine for a very long period
of time. The only thing I can see that's changed since we started using it
is the index size getting bigger.

Its obviously quite a critical problem because our users can only search on
the outdated index, and I haven't been able to find anything on this issue
anywhere.

I am hoping someone might be able to figure out what's going on.

The error that is received is basically when an index writer/reader/searcher
attempts to open, it reports that it cant find the segments file.

Is there any known issue where this occurs? I know I am using the .Net
implementation but I would assume that lucene would be quite universal
across different platforms. I have noticed that there doesn't appear to be
much support for the .Net version, or at least I have not been able to find
any.

If it helps any, below are the methods used for indexing

Full Index Update

Directories are searched recursively
For each file we check to see if it is already in the index (comparing size,
modified time, etc)
If it does, then we ignore the file
If it doesn't, we delete the one currently in the index, then add the
updated file

Email Parsing and Check Summing

I also have an email parser, and a check summer that grabs any email
addresses from the document and calculates the checksum of the text to
attempt to avoid duplicate documents. If there is a document with the same
email and checksum then the document is stored but is marked as a duplicate.

File System Watcher

Once the full text index is finished, then the indexer begins to process
files in que that are generated from a file system watcher, the file system
watcher runs constantly so indexing is done in a live state.

--
View this message in context:
http://www.nabble.com/Segments-file-disappears%2C-index-no-longer-functions.-tp21579880p21579880.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Segments file disappears, index no longer functions.

Reply via email to